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Abstract 



How do we build distributed systems that are secure? Cryptographic techniques can be used 
to secure the communications between physically separated systems, but this is not enough: 
we must be able to guarantee the privacy of the cryptographic keys and the integrity of 
the cryptographic functions, in addition to the integrity of the security kernel and access 
control databases we have on the machines. Physical security is a central assumption 
upon which secure distributed systems are built; without this foundation even the best 
cryptosystem or the most secure kernel will crumble. In this thesis, I address the distributed 
security problem by proposing the addition of a small, physically secure hardware module, 
a secure coprocessor, to standard workstations and PCs. My central axiom is that secure 
coprocessors are able to maintain the privacy of the data they process. 

This thesis attacks the distributed security problem from multiple sides. First, I an- 
alyze the security properties of existing system components, both at the hardware and 
software level. Second, I demonstrate how physical security requirements may be iso- 
lated to the secure coprocessor, and showed how security properties may be bootstrapped 
using cryptographic techniques from this central nucleus of security within a combined 
hardware/software architecture. Such isolation has practical advantages: the nucleus of 
security-relevant modules provide additional separation of concern between functional re- 
quirements and security requirement, and the security modules are more centralized and 
their properties more easily scrutinized. Third, I demonstrate the feasibility of the secure co- 
processor approach, and report on my implementation of this combined architecture on top 
of prototype hardware. Fourth, I design, analyze, implement, and measure performance of 
cryptographic protocols with super-exponential security for zero-knowledge authentication 
and key exchange. These protocols are suitable for use in security critical environments. 
Last, I show how secure coprocessors may be used in a fault-tolerant manner while still 
maintaining their strong privacy guarantees. 
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Chapter 1 

Introduction and Motivation 

Js privacy the first roadkill on the Information Superhighway? 1 Will super- 
highwaymen way lay new settlers to this electronic frontier? 

While these questions may be too steeped in metaphor, they raise very real concerns. 
The National Information Infrastructure (Nil) [32] grand vision would have remote com- 
puters working harmoniously together, communicating via an "electronic superhighway," 
providing new informational goods and services for all. 

Unfortunately, many promising Nil applications demand difficult-to-achieve distributed 
security properties. Electronic commerce applications such as electronic stock brokerage, 
pay-per-use, and metered services have strict requirements for authorization and confi- 
dentiality — providing trustworthy authorization requires user authentication; providing 
confidentiality and privacy of communications requires end-to-end encryption. As a result 
of the need for encryption and authentication, our systems must be able to maintain the 
secrecy of the keys used for encrypting communications, the secrecy of the user-supplied 
authentication data (e.g., passwords), and the integrity of the authentication database against 
which the user-supplied authentication data is checked. Furthermore, hand in hand with 
the need for privacy is the need for system integrity: without the integrity of the system 
software that mediates access to protected objects or the integrity of the access control 
database, no system can provide any sort of privacy guarantee. 

Can strong privacy and integrity properties be achieved on real, distributed systems? 

The most common computing environments today on college campuses and workplaces 
are open computer clusters and workstations in offices, all connected by networks. Physical 
security is rarely realizable in these environments: neither computer clusters nor offices 
are secure against casual intruders, 2 let alone the determined expert. Even if office locks 
were safe, the physical media for our local networks are often but a ceiling tile away — 
any hacker who knows her raw bits can figure out how to tap into a local network using a 
PC. To make matters worse, for many security applications we must be able to protect our 
systems against the occasional untrustworthy user as well as intruders from the outside. 



*The source of this quote is unclear; one paraphrased version appeared in print, as "If privacy isn't already 
the first roadkill along the information superhighway, then it's about to be" [55], and other variants of this 
have appeared in diverse locations. 

2 The knowledge of how to pick locks is widespread; many well-trained engineers can pick office locks [96]. 
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Standard textbook treatments of computer security assert that physical security is a 
necessary precondition to achieving overall system security. While this may have been a 
requirement that was readily realizable for yesterday's computer centers with their large 
mainframes, it is clearly not a realistic expectation for today's PCs and workstations: their 
physical hardware is easily accessible by both authorized users and malicious attackers 
alike. With complete physical access, the adversaries can mount various attacks: they can 
copy the hard disk's contents for offline analysis; replace critical system programs with 
trojan horse versions; replace various hardware components to bypass logical safeguards, 
etc. ' ' * / ■ 

By making the processing power 'of workstations widely and easily available, we have 
made the entire system hardware accessible to interlopers. Without a foundation of physical 
security to build on, logical security guarantees crumble. How can we remedy this? 

Researchers have realized the vulnerability of network wires and other communication 
media. They have brought tools from cryptography to bear on the problem of insecure 
communication networks, leading to a variety of key exchange and authentication protocols 
[25, 27, 30, 59, 67, 78, 80, 93, 98] for use with ehd-to-erid encryption, providing privacy 
for network communications. Others have rioted the vulnerability of workstations and their 
disk storage to physical attacks, and have developed a variety of secret sharing algorithms 
for protecting data from isolated attacks [39, 75, 86]. Tools from the field of consensus 
protocols can be applied as well. Unfortunately, all of these techniques, while powerful, 
still ' assume some measure of physical security, a property unavailable on conventional 
workstations and PCs. The gap between reality and the physical security assumption must 
be closed before these techniques can be implemented in a believable fashion. 1 

Can we provide the necessary physical security to PCs and workstations without crip- 
pling their accessibility? Can real, secure electronic commerce applications be built in a 
networked, distributed computing environment? I argue that the answer to these questions 
is yes, and I have built a software/hardware system called Dyad that demonstrates my ideas. 

In this thesis, I analyze the distributed security problem not just from the traditional 
cryptographic protocol viewpoint but also from the viewpoint of a hardware/software sys- 
tem designer. I address the need for physical security and show how we can obtain 
overall system security by bootstrapping from a limited amount of physical security that 
is achievable for workstation/PC platforms — by incorporating a secure coprocessor in a 
tamper-resistant module. This secure coprocessor may be realized as a circuit board on the 
system bus, a PCMCIA 3 card, or an integrated chip; in my Dyad system, it is realized by 
the Citadel prototype from IBM, a board-level secure coprocessor system. 

I analyze the natural security properties inherent in secure coprocessor enhanced com- 
puters, and demonstrate how security guarantees can be strengthened by bootstrapping 
security using cryptographic techniques. Building on this analysis, I develop a combined 
software/hardware system architecture, providing a firm foundation upon which applica- 
tions with stringent security requirements -eah' be built. I describe the design of the' Citadel 
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prototype secure coprocessor hardware, the Mach [2] kernel port running on top of it. the 
resultant system integration with the host. platform, the security applications running on top 
of the secure coprocessor, and new. highly secure cryptographic protocols for key exchange 
and zero-knowledge authentication. 4 

By attacking the distributed security problem from all sides, 1 show that it is eminently 
feasible to build highly secure distributed systems, .with bootstrapped, security properties 
derived from physical security. 

The next chapter discusses in detail what is, meant by the term secure coprocessor and 
the basic security properties that secure coprocessors must possess. Chapter 3 outlines 
five applications that are impossible without the security properties provided by. secure 
coprocessors.. Chapter, ,4 describes the combined hardware/software system architecture of 
a secure coprocessor-enhanced host. I consider the basic operational requirements induced 
by the demands of security applications and then describe the actual system architecture as 
implemented in the Dyad secure coprocessor system prototype. Chapter 5 describes my 
new cryptographic protocols, and gives an in-depth analysis of their cryptographic strength. 
Chapter 6 addresses the security issues present when initializing a secure coprocessor, and 
presents techniques to make a secure coprocessor system fault tolerant. Additionally, ^ 
1 demonstrate techniques where proactive fault diagnostics may allow, some classes of 
hardware faults to be detected and permit the replacement of a , malfunctioning secure, 
coprocessor. Chapter 7 shows how both jthe secure coprocessor hardware, and system 
software may be verified, and examines the consequences of system privacy breaches. 
Chapter 8 gives performance figures for the cryptographic algorithms, the overhead incurred 
by crypto-paging, and the raw DMA transfer times for our prototype system. ln : chapter 9, 
I propose challenges for futurp developers of secuje coprocessors. ; , 



4 Some of this research was joint work:, the design of Dyad^he, secure; applications,. and,th«^new; protocols 
was done with Doug Tygar of CMU. The basic secure coprocessor model was developed with White, Palmer, 
and Tygar. The Citadel system was designed by Steve Weingart, Steve White, and Elaine Palmer of IBM; I 
debugged Citadel and redesigned parts of it. . . u.i> ':niw^:r ' */;..-.■•-*•:..' r, • ■•. * 
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Chapter 2 

Secure Coprocessor Model 

A secure coprocessor is a hardware module containing (1) a CPU, (2) bootstrap ROM, 
and (3) secure non-volatile memory. This hardware module is physically shielded from 
penetration, and the I/O interface to the module is the only way to access the internal state 
of the module. (Examples of packaging technology are discussed later in section 2.3.) This 
hardware module can store cryptographic keys without risk of release. More generally, the 
CPU can perform arbitrary computations (under control of the operating system); thus the 
hardware module, when added to a computer, becomes a true coprocessor. Often, the secure 
coprocessor will contain special purpose hardware in addition to the CPU and memory; for 
example, high speed encryption/decryption hardware may be used. 

Secure coprocessors must be packaged so that physical attempts to gain access to the 
internal state of the coprocessor will result in resetting the state of the secure coprocessor 
(i.e., erasure of the secure non-volatile memory contents and CPU registers). An intruder 
might be able to break into a secure coprocessor and see how it is constructed; the intruder 
cannot, however, learn or change the internal state of the secure coprocessor except through 
normal I/O channels or by forcibly resetting the entire secure coprocessor. The guarantees 
about the privacy and integrity of the secure non-volatile memory provide the foundations 
needed to build distributed security systems. 

With a firm security foundation available in the form of a secure coprocessor, greater 
security can be achieved for the host computer. 

2.1 . Physical Assumptions for Security 

All security systems rely on a nucleus of assumptions. For example, it is often assumed that 
encryption systems are resistant to cryptanalysis. Similarly, I take as axiomatic that secure 
coprocessors provide private and tamper-proof memory and processing. These assumptions 
may be falsified: for example, attackers may exhaustively search cryptographic key spaces. 
Similarly, it may be possible to falsify my physical security axiom by expending enormous 
resources (possibly feasible for very large corporations or government agencies). I rely 
on a physical work-factor argument to justify my axiom, similar in spirit to intractability 
assumptions of cryptography. My secure coprocessor model does not depend on the partic- 
ular technology used to satisfy the work-factor assumption. Just as cryptographic schemes 
may be scaled or changed to increase the resources required to penetrate a cryptographic 
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system, current security packaging techniques may be scaled- or changed to increase the 
work-factor necessary to successfully bypass the secure coprocessor protections. 

Chapter 3 shows how to build secure subsystems running partially on a secure copro- 
cessor. 



2.2. Limitations of Model 

Confining all computation within secure coprocessors would ideally suit our security needs, 
but in reality we cannot and should not — convert all of our processors into secure 
coprocessors. There are two main reasons: first, the inherent limitations of physical security 
techniques for packaging circuits; and second, the need to keep the system maintainable. 
Fortunately, as we shall see in chapter 3, we do not need to physically shield the entire 
computer. It suffices to physically protect only a portion of the computer. 

If the secure coprocessor is sealed in epoxy or a similar material, heat dissipation require- 
ments limit us to one or two printed circuit boards. Future developments may eventually 
relax this and allow us to make more of the solid-state components of a multiprocessor 
workstation physically secure, perhaps an entire card cage; however, the security problems 
of external mass storage and networks will in all likelihood remain constant. 

While it may be possible to secure package an entire multiprocessor, it is likely to ber 
impractical and is unnecessary besides. If we can obtain similar functionalities by placing 
the security concerns within a single coprocessor, we can avoid the cost and maintenance 
problems of making multiple processors and all memory secure. 

Easy maintenance requires modular design. Once a hardware module is encapsulated 
in a physically secure package, disassembling the module to fix or replace some compo- 
nent will probably be impossible. Wholesale board swapping is a standard maintenance / 
hardware debugging technique, but defective boards are normally returned for repairs; with 
physical encapsulation, this will no longer be possible, thus driving tip costs. Moreover, 
packaging considerations and the extra hardware development time imply that secure coh 
processor's technology may lag behind the host system's technology — perhaps by one 
generation. The right balance between physically shielded and unshielded components 
depends on the class of intended applications. For many applications, only a small portion 
of the system must be protected. ? , ; 1 >x * < • «. * 

What aboiit system-level recovery after a hardware fault? If secrets are kept only within 
a single secure coprocessor, having to replace a faulty unit with a different one due to a will 
lead to data loss; After we replace i ; broken coprocessor with a good one, will we be able 
to dbntinue running bur %r3p^ techniques for periodic checkup ' 

testing arid fault tolerant ^bpefatibffcf sefcure coprocessors. . - 1 : ■ 1 f * 
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2.3. Potential Platforms 

Several physically secure processors exist. This section describes some of these plat- 
forms, giving the types of attacks these systems resist, and system limitations arising from 
packaging technology. 

The ^/ABYSS [103] and Citadel [105] systems employ board-level protection. The 
systems include a standard microprocessor (Citadel uses an Intel 80386), some non-volatile 
(battery backed) RAM, and special sensing circuitry to detect intrusion into a protective 
casing around the circuit board. Additionally. Citadel includes fast (approximately, 30 
MBytes/sec) DES encryption hardware. The security circuitry erases non-volatile memory 
before attackers can penetrate far enough to disable the sensors or read memory contents. 

. Physical security mechanisms must protect against many types of physical attacks. 
In the fi ABYSS and Citadel systems, it is assumed that intruders must be able, to probe 
through a straight hole of at least one millimeter in diameter, to penetrate the system (probe 
pin voltages, destroy sensing circuitry^ etc). To prevent . direct intrusion, these systems 
incorporate sensors consisting of fine (40 gauge) nichrome wire and low power sensing 
circuits powered by a long-lived battery. The wires are loosely but densely wrapped in 
many layers around the circuit board and the entire assembly is then dipped in epoxy. The 
loose and dense wrapping makes the exact position of the wires in the epoxy unpredictable 
to an adversary. The sensing electronics detect open circuits or short circuits, in the wires 
and erase non-volatile memory if intrusion is attempted. Physical intrusion by mechanical 
means (e.g., drilling) cannot penetrate the epoxy without breaking one of these wires. 

Another attack is to dissolve the epoxy with solvents to expose the sensor wires. To 
block this attack, the epoxy is designed to be chemically "harder" than the sensor wires. 
Solvents will destroy at least one of the wires — ~ : and thus create an open-circuit — before* 
the : intruder can bypass ih6 potting material and access the circuit board. 

Yet another attack uses low temperatures. Semiconductor memories retain state at very 
low temperatures even without power, so an attacker could freeze the secure coprocessor 
to disable jthe : battery and then; extract memory contents. The systems contain temperature 
sensors .which trigger erasure of secrets before the temperature drops below the critical 
level. (The system must have enough thermal mass to prevent rapid freezing — by being 
dipped into liquid nitrogen or helium^ for example — and this places some limitations on 
the minimum size of the system.. , This has important implications for secure smartcard 
designers.) . . , . : . t , . ■ ■ : , 

The . next step in sophistication is the high-powered laser attack. The idea is to use a 
high powered (ultraviolet) laser to cut through the eptoxy and disable the sensing circuitry 
before it has a chance to react. To protect against such an attack, alumina or silica is added'; 
causing the epoxy to absorb ultraviolet light.. The generated heat creates, mechanical stress, 
causing the sensing wires to break. , ^ ; ~. < 

Instead of the board-level approach, physical ^qeurity .carjj be, provided for smaller, 
chip-level packages. ^Clipper^d, Cap [4, 99, 

100] are special purpose encryption chips. These integrated circuit chips axe reportedly 
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designed to destroy key information (and perhaps other important encryption parameters 
— the encryption algorithm, Skipjack, is supposed to be secret as well) when attempts are 
made to open the integrated circuit chips' packaging. Similarly, the iPower [58] encryption 
chip by National Semiconductor has tamper detection machinery which causes chemicals 
to be released to erase secure data. The quality of protection and the types of attacks which 
these system can withstand have not been published.. 

Smartcards are another approach to physically secure coprocessing [54]. A smartcard 
is a portable, super-small microcomputer. Sensing circuitry is less critical for many ap- 
plications (e.g., authentication, storage of the user's cryptographic keys), since physical 
security is maintained by the virtue of its portability. Users carry their smartcards with 
them at all times and provide the necessary physical security. Authentication techniques 
for smartcards have been widely studied [1, 54]. Additionally, newer smartcard designs 
such as some GEMPlus or Mondex cards [35] feature limited physical security protection, 
providing a true (simple) secure coprocessor. 

The technology envelope defined by these platforms and their implementation parame- 
ters constrains the limits of secure coprocessor algorithms. As the computation power and 
physical protection mechanisms formobile computers and smartcards evolve, this envelope 
will grow. ' ... i ' ! - ' \ 

2 A. Security Partitions 

System components of networked hosts may be classified by their vulnerabilities. to various 
attacks and placed within "native" security partitions. E These natural security partitions 
contain system components that provide common security guarantees. Secure coprocessors 
add a new system component with fewer inherent vulnerabilities and create a new security 
partition; cryptographic techniques reduce some of these vulnerabilities and enhance secu- 
rity. For example, using a secure coprocessor to boot a system and ensure that the correct 
operating system is running provides privacy and integrity guarantees on memory not oth- 
erwise possible. Public workstations can employ secure coprocessors and cryptography to 
guarantee the privacy of disk storage and provide integrity checks. 

Table 2.1 shows the vulnerabilities of various types of memory when no cryptographic 
techniques are used. Memory within a secure coprocessor is protected against physical 
access. With the proper protection mechanisms, data stored within a secure coprocessor 
can be neither read nor tampered with. A working secure coprocessor can ensure that 
the operating system was booted correctly (see section. 3. 1) : and that the host RAM is 
protected against unauthorized logical access. 5 It is not, however, well protected against 
physical access — we can connect logic analyzers to the memory bus and listen passively 

5 1 assume that the operating system provides protected address spaces. Paging is performed on either a remote 
disk via encrypted network : commimication (see section 4.1^ local disk which is immune to all 

but physical attack's. To protect "against physical attacks for the latter case, we may need to encrypt the data 
anyway or ensure that we can erase the ^ac^rig data 1 from the disk before shuttirig down: ; 
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Subsystem 


Vulnerabilities 




Availability 


Integrity/Privacy ' 


Secure Coprocessor 


None 


None 


Host RAM 


Online Physical 


Online Physical 




Access 


Access 


Secondary Store 


Offline Physical 


Offline Physical 




Access 


Access 


Network 


Online Remote 


Online Remote Access 


(communication)" 


Access 


Offline Analysis 



Table 2.1 Subsystem Vulnerabilities; Without Cryptographic Techniques 



to memory traffic, or use an in-circuit emulator to replace the host processor and force the 
host to periodically disclose the host system's RAM contents. Furthermore, it is possible 
to use multi-ported memory to remotely monitor RAM. (While it may be impractical to do 
this in a way invisible to users, this line of attack can not be entirely ruled out.) Secondary 
storage may be more easily attacked than RAM since the data can be modified offline; to <io * 
this, however, an attacker must gain physical access to the disk. Network communication 
is completely vulnerable to online eavesdropping and offline analysis, as' well as online 
message tampering. Since networks are used for remote communication, it is' clear that 1 
these attacks may be performed remotely. 



Subsystem 


Vulnerabilities 




Availability 


Integrity/Privacy 


Secure; Coprocessor. 


None 


None 


Host RAM 


Qnline Physical 


Host Processor 


Access 


Data . 


Secondary Store 


Offline Physical 
, Access 


None . . 


Network 


Online Remote 


None : 


(communication) . 


Access ... 





Table 2.2 Subsystem Vulnerabilities With Cryptographic Techniques 

As table 2.2 Illustrates, ehcryptibri can sfte^gt^ 
tion vulnerabilities still exist; however, -tampering eaa]t?e 'detected by using. cryptographic 
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checksums as long as the checksum values are stored in tamper-proof memory. Note that 
the privacy level is a function of the subsystem component using the data. If host RAM 
data is processed by the host CPU, moving the data to the secure coprocessor for encryption 
is either useless or prohibitively expensive [29, 61] — the data must appear in plaintext 
form to the host CPU and is vulnerable to online attacks. However, if the host RAM data is 
serving as backing store for secure coprocessor data pages (see section 4.1.3), encryption is 
appropriate. Similarly, encrypting the secondary store via the : host CPU protects that data 
against offline privacy loss but not online attacks, whereas encrypting that data within the 
secure coprocessor protects that data against.online privacy attacks as well, as long as that 
data need not ever appear in plaintext form in the host memory. . 

For example, if we wish to send and read encrypted electronic mail, encryption and 
decryption can be performed by the host processor since the data must reside within both 
hosts for the sender to compose it and for the receiver to read it. But, the exchange of the 
encryption key used for the message should involve secure coprocessor computation: key 
exchange should use secrets that must remain within the secure coprocessor. 6 

2.5. Machine-User Authentication 

How can we authenticate users to machines and vice versa? One solution is smartcards (see 
section 2.3) with zero knowledge protocols (see secton 5.1.2). 

Another way to verify the presense of a secure coprocessor is to ask a third-party entity 
— such as a physically sealed third-party computer — to check the machine's identity for 
the user. This service can also be provided by normal network servers machines such as 
file servers. Remote services must be difficult to emulate by attackers. Users will notice 
the absence of these services to detect that something is amiss. This necessarily implies 
that these remote services must be available before the users authenticate to the system. 

The secure coprocessor must be present for the remote services to work correctly. 
Evidence that these services work can be conveyed to the user through a secure display 
that is part of the secure coprocessor. If no such display is available, care must be taken to 
verify that the connection to the remote, trusted third-party server is not being simulated by 
an attacker. To circumvent this attack, we must be able to reboot the workstation and rely 
on the local secure coprocessor to perform host system integrity checks. 

Unlike authentication protocols reliant on central authentication servers [81, 80, 93], 
this machine-user authentication happens once, at boot time or session start time. Users 
may be confident that the workstation contains an authentic secure coprocessor if access 
to any normal remote service can be obtained. To successfully authenticate to obtain the 
service, attackers must either break the authentication protocol, break the physical security 



6 This is true even if public key cryptography is used. Public key encryption requires no secrets and may 
be performed in the host; signing the message, however, requires the use of secret values and thus must be 
performed within the secure coprocessor. 
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in the secure coprocessor, or bypass the physical security around the remote server, if the 
remote service is sufficiently complex, attackers will not be able to emulate it. 

2.6. Previous Work 

The secure coprocessor system model is much more sophisticated and comprehensive than 
that found in previous work. It fully examines the natural security boundaries between sub- 
systems in computers and how cryptographic techniques may be used to boost the security 
within these subsystems. The : systems : of Best [8] and Kent [46] only considered the use 
of encryption for copy-protection, and employed physical protection for the main CPU and 
primary memory. White and Comerford [104] wfere the first to consider the use of a security 
coprocessor, but their system were targeted for copy-protection and for providing crypto- 
graphic services to the host. New to the secure coprocessor model is security bootstrapping 
and crypto-paging, important techniques for building secure distributed systems. 
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Chapter 3 
Applications 

Because secure coprocessors can process secrets as well as store them, they can do much 
more than just keep secrets confidential. 1 describe how to use secure coprocessors to 
realize exemplar secure applications: (1) host integrity verification, (2) tamper-proof audit 
trails, (3) copy protection, (4) electronic currency, and (5) secure postage meters. None of 
these are possible on physically exposed systems. These applications are discussed briefly 
below. 

3.1 . Host Integrity Check 

Trojan Jhorse software dates back to the 1960s, if not earlier. Bogus login programs are 
among most common, though games and fake utilities were (and are) also widely used to 
set up back doors as well. Computer viruses exacerbate the problem of host integrity — 
the system may easily be inadvertently corrupted during normal use. 

In the rest of this section, I discuss how secure coprocessors addresses this problem, 
discuss a few alternative solutions, and point out their drawbacks. 

3.1.1. Host Integrity with Secure Coprocessors 

Providing trust in the integrity of a computer's system software is not so difficult if we can 
trust the integrity of the execution of a single program: we can bootstrap our trust in the 
integrity of host software. 7 If we are able to run a single trusted program on the system, we 
can use that program to verify the integrity of the rest of the system. 

Getting that first trusted program running is fraught with problems, even if we ignore 
management and operational difficulties, especially for machines in open clusters or un- 
locked offices. Running an initial trusted program becomes feasible when we add a secure 
coprocessor — the secure coprocessor runs only trusted, unmodified software, and this 
software uses cryptographic techniques to verify the integrity of the host software resident 
on the host's disks. 



"Bootstrapping security with secure coprocessors is completely different from the security kernels found 
in the Trusted Computer Base (TCB) [101] approach: secure coprocessors use cryptographic techniques to 
ensure the integrity of the rest of the system, and security kernels in a TCBs simply assume that the file store 
returns trustworthy data. 



13 



To verify integrity, a secure coprocessor maintains a tamper-proof database (kept in 
secure non-volatile memory) containing a list of the host's system programs along with their 
cryptographic checksums. Cryptographic checksum functions are applied to executable file. 
The checksums are unforgeable: given a file F and the cryptographic checksum function 
crypto -cksmQ, creating a program F f such that 

F ^ F* and crypto _ cksm(F) == ciypto _ cksm(F) 

is computationally intractable. The size of the output of a one-way hash function is small 
relative to the input; for example, the MD5 hash function's output is 128 bits' [77]. 

Host integrity checking is different for^the cases of stand-alone workstations and net- 
worked workstations with access to, distributed services such as AFS [91] or Athena [5], 
While publicly accessible stand-alone workstations have fewer avenues of attack, there.are 
also fewer options for countering attacks. I concurrently examine both cases: 

Performing the necessary integrity checks with a secure coprocessor can solve the host 
integrity problem. Because of privacy and integrity guarantees on secure coprocessor 
memory and processing, we can have confidence in results from a secure coprocessor that 
checks the integrity of the host's state at boot-up. If the secure coprocessor is first to gain 
control of the system when the system is reset, it can decide whether to allow the host CPU 
to boot after checking the disk-resident bootstrap program, operating system kernel, and all 
system utilities for tampering. . ; , 

The cryptographic checksums of system images must be stored in the secure copjoces- \ 
sor's secure non- volatile memory and be protected against modification (and sometimes, 
depending on the cryptographic checksum algorithm chosen, against exposure). Of course, 
tables of cryptographic checksums can be paged out to host memory or disk after first 
checksumming and encrypting them within the secure, coprocessor; this can be handled as 
an extension to normal virtual memory paging (see section 4.1 3^ The secure coprocessor 
can detect any modifications to the system objects and can check the integrity of the external 
storage. 

Along with integrity, secure coprocessors offer privacy; this property allows the use 
of both keyed (such as Rivest's MD5 [77]^ Merkle's Snefru [56],- Jueneman's Message 
Authentication Code (MAC) [44], and IBM's Manipulation Detection Code (MDC) [41]) 
and keyless (such as chained DES [102], and Karp and Rabin's family of fingerprint 
functions [45]) cryptographic checksum functions. All cryptographic checksum functions 
require integrity protection of the cryptographic checksums; keyed checksum functions 
additionally require privacy protection of a key. 

There are no published strong intractability arguments for major keyless cryptographic 
checksum functions; their design appeared to be based on ad hoc methods. Keyed cryp- 
tographic checksum functions require certain information to be kept secret. In the keyless 
case, chained DES keeps encryption keys (which select particular encryption functions) 
secret; Karp-Rabin fingerprint functions use a secret key to select a particular hash func- 
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tion from a family of hash-functions based on irreducible polynomials 8 over Z:[.t]. i.e.. 
fk £ f - {// - p(x) *-> p(x) mod irred,(x)}. The resulting residue polynomial is the hash 
result. Jf the key polynomial is unknown by the adversary, then given input q(x). there is 
no procedure for finding q r (x) where 

g\x) f q(x), where/*^) = f k {q r ) 

except by chance. The security of Karp-Rabin is equivalent to probability of two random 
inputs being mapped to the same residue, which is well understood [45, 68]. Chained DES 
is not as well understood as the Karp-Rabin functions, since very little is known about the 
group structure of the permutation, group induced by DES encryptions. 

Secure coprocessors can keep keys secret and hence can implement keyed cryptographic 
checksums. The Karp-Rabin fingerprint functions are particularly attractive, since they 
have strong theoretical underpinnings (see section 5.2.5), they are very fast and easy to 
implement 9 , and they may be scaled' for different levels of security (by using a higher 
degree irreducible polynomial as the modulus)/ 

Secure coprocessors simplify the System upgrade problem. This is important when there 
are large numbers of machines on a network; systems can be securely upgraded remotely 
thrbiigh the network, since the security of communication between secure coprocessors is 
guaranteed. Furthermore; system images are encrypted while being transferred over the ' 
network and while resident oh secondary storage. This provides us with the ability to keep * 
proprietary code protected against most attacks. As section 3.3 notes, we can run (portions k 
of) the proprietary software only within the secure coprocessor, allowing vendbrs to have 
execute-only semantics — proprietary software need never appear in plaintext outside of a— 
secure coprocessor. ^ 

Section 4.1.1 examines the details of host integrity check as it relates to secure co-- 
processor architectural requirements, and section 4.1.5 and chapter 6 discuss how system 7 
upgrades are handled by a secure coprocessor. Also relevant is the problem of how the user 
can know if a secure coprocessor is properly miming in a system; section 2.5 discusses this. 

3.1.2. Absolute Limits 

So far, we haye limited the attackers to, using their physical access to corrupt the software 
of the host computer. Is ;the; host integrity problem insoluble if we allow trojan horse 
hardware? > Clearly, sufficiently sophisticated hardware emulation can fool both users and 
any integrity checks, There is no completely reliable way for the secure coprocessor to ; 
detect if an attacker replaced a disk controller wjth a "double-entry" controller providing 
expected data during system integrity verification but returning trojan horse- data (system 
programs) for execution. Similarly, it js hard to detect if^e host CPUv is substituted with a 



8 A polynomial is said to be irreducible if it^caianot'be.factor^d'intoriolynoniials of [lower rdegree in the ring.of 
polynomials, in. this case, ^[x]^ ; 1 . ; . ,v : ;j ro'K 1 '-- ^ .-- .;.-t ; J 

9 Thus ther implementation is likely. to be correct. - r . ; - ; < o ~ • ■ V r m" i ; : ' ' ' 1 ' * ■ ' : ' ' ; ! 
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"double-entry" CPU which fails to correctly run specific pieces of code in the OS protection 
system. To raise the stakes, we can have the secure coprocessor do behavior and timing 
checks at random intervals. This makes such "double-entry" hardware emulation difficult 
and forces the hardware hackers to build more perfect trojan horse hardware. 

3.1.3. Previous Work 

Other approaches have been tried without a^ physical basis for security. This section de- 
scribes these approaches and their shortcomings. 

Authorized Programs 

The host integrity problem can be partially ameliorated by guaranteeing that all programs - 
have been inspected and approved by a trusted authority (e.g., a local system administrator 
or computer vendor), but this is an incomplete solution. Guarantees, about the integrity 
of source code are not enough [95] - — we also need to trust the compilers, editors, and 
other tools we use to manipulate the code. Even if having the trusted authority inspect the 
program's object code is practical, there is no guarantee that the disassembler is not also 
corrupted and hiding all evidence of corruption. 10 , ; 

If the object code is built from inspected source code in a clean environment and that 
object code is securely installed into the workstations, we still have little reason to trust 
the machines. Some guarantee must be provided that the software has not been modified 
after installation — after all, we do not know who has had access to the machine since the 
trusted-software installation, and the once clean software 5 may have been corrupted. 

With computers getting smaller (and more portable) and workstations often physically 
accessible in public computer clusters, attackers can easily bypass any logical safeguards 
to corrupt the programs on a computer's disks. Perhaps a trojan horse program has been 
inserted since the last time the host was inspected — how can a user tell if the operating 
system kernel is correct? It is not sufficient to have central authorities that guarantee the 
original copy or inspect the host's software periodically. The integrity of the kernel image 
and system utilities stored on disk must be verified each time the computer is used. 

Diskless Workstations 

In the case of networked "diskless" workstations, integrity verification would appear to be 
confined to the trusted file servers implementing a distributed file system. Any paging to 
implement virtual memory would go across 5 the network to a trusted- server with disk storage 
[28,79,108]. i r-S:\ c; ; ' ,r . ■ , 

What are the difficulties with this> trusted file server model? First, ndn-publicly readable 
files must be encrypted before being transferred Over the network/ This implies the ability 



,0 This would be similar to the ^te^ intercept ' system 1/6 

requests and return original, unmodified data to hide the existence of the virus [23]. 
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to use secret keys to decrypt these files, and keeping such keys secret in publicly accessible 
workstations is impossible. 

A more serious problem is that the workstations must be able to authenticate the identity 
of the trusted file servers (the host-to-host authentication problem). Since workstations 
cannot keep secrets, we cannot use shared secrets to encrypt and authenticate data between 
the workstation and the file servers. The best that we can do is, to have the file servers 
digitally sign the kernel image when we boot over the network -- but then we must be able 
to store the public keys of the trusted file servers. With exposed workstations, there is no 
safe place to store this type of integrity information. Attackers can always modify the file 
servers' public keys (and network addresses) stored on the workstation, so it contacts false 
servers. Obtaining public keys from some external key server only pushes the problem one 
level deeper — the workstation would need to authenticate the identity of the key server, 
and attackers need only to modify the stored public key of the key server. 

If we page virtual memory over the network (which cannot reasonably be assumed to be 
secure), the problem only becomes worse. Nothing guarantees the privacy or integrity of the 
virtual memory as it is transferred over the network: If the data is transferred in plaintext, an ■ 
attacker can simply record network packets to break privacy and modify/substitute network 
traffic to destroy integrity. Without the ability to keep secrets, encryption is useless for 
protecting the computer's memory — attackers can obtain the encryption keys by physical 
means and destroy privacy and integrity as before. 

Secure Boot Media 

Several researchers have argued for using a secure-boot floppy containing, system integrity 
verification code to bring machines up. This is essentially the approach taken in Tripwire 
[47] and similar systems. 11 , Consider the assumptions involved here. . 

First, we must assume the host hardware has not been compromised. If the.host hardware 
is compromised (see section 3 . 1 .2), the "secure" boot floppy canbe ignored or even modified 
when it is used. (Secure coprocessors, onthe other hand, cannot be bypassed, especially 
since users will want their machine's secure coprocessor to authenticate its identity.) Next, 
we must fit our boot code, integrity checking code, and cryptographic checksum database 
onto one or two diskettes, and this code must be reasonably fast — this is a pragmatic 
concern, since the integrity checking procedure needs to be easy and fast so users are 
willing to do it every time they start using a machine. 

Secure-boot floppies are widely used. on home computers for virus detection. Why isn't 
this approach appropriate for host integrity checking? Virus scanners and host integrity 
checkers have ; similar integrity. requirements — they require a clean enviroriment. Unlike 
integrity checks that detect any modifications made to files, virus scanners typically scan 
for occurrences ,of suspect code,. fragments within; files,:: The fragments] appearing on the 
list of ;suspect code fragments are drawn from sample, s observed *in. Common viruses. It is 



11 Because Tripwire checked modifications to system files w^hHe.ruryii^g.on.the host kernel, it is, vulnerable, to 
"stealth" attacks on the kernel , , %'-;.*! " :>*>v- : > V- ; * '-v..:; <• - 
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presumed that these code fragments will not occur in "normal' 7 code. 12 The integrity of the 
code fragment list must be protected, just like the database of cryptographic checksums. 
Virus scanners (and general integrity checkers) can bootstrap trust by first verifying that 
a core set of system programs are infection-free (unmodified), and have those programs 
perform faster, more advanced scanning (full integrity checks) or run-time virus detection 
(protection OS kernel). . .. 

Although virus scanning and host integrity checking have much in common, there are 
some crucial differences. Virus scanners cannot detect modifications to system software 
— they only detect previously identified viruses. Moreover, virus scanners' lists of suspect 
code fragments are independent of machines' software configurations: to update a list 
one adds new suspect code fragments as new viruses are identified. An integrity checker, 
however, must maintain an exact list of the system programs that .it should check, along 
with their cryptographic checksums. The integrity of this list is paramount to the correct 
operation of the integrity checker, since attackers (including viruses) can otherwise easily 
corrupt the cryptographic checksum database along with the target program to hide the 
attack. 13 Version control becomes a headache as system software is updated. 

Only trusted users are allowed access to the master boot floppy and untrusted users 
must get a (new) copy of the boot floppy from trusted operators each time a machine is 
rebooted from ah unknown state. Users cannot have access to the master boot floppy since 
it must not be altered. Read-only floppies do not help, since we assume that there may be 
untrustworthy users. Careless use (i.e., reuse) of boot floppies becomes another channel of 
attack — boot floppies can easily be made into viral vectors. 

Like diskless workstations, boot floppies cannot keep secrets. Encryption does not 
help, since the workstation or PC must be able to decrypt them, and workstations cannot 
keep secrets (encryption keys) either. The only way to assure integrity without completely 
reloading the system software is to check it by computing cryptographic checksums on 
system images. This is essentially the same procedure used by secure coprocessors, except 
that instead of providing integrity within a piece of secure hardware we use trusted operators. 

^Requiring users to obtain a fresh copy of the integrity check software arid data each time 
they need to reboot a new machine is cumbersome. If different machines have different 
software releases, then each machine will .have a different secure boot floppy. Management 
will be difficult, especially if we wish to revoke usage of programs found, to be buggy by 
eliminating their cryptographic checksum from the database to force .an update. 

Furthermore, using a centralized database of all the software for all versions of that 
software on the various machines will be a operational nightmare. Any centralized database 
could become a central point of attack. Destruction of this database would disable all secure 
bootstraps. 



l2 Thus, virus scanners will have false, positive.. re^lts,_ .when these code fragpients are. found inside of a 
virus-free prpgram. ". " ( ' . t " " "'V* * ' * 

13 There are PC-basejd integrity checkers wliich append si to the executable files to deter 

attack's; of course, this sort 'of "integrity ch^e'c^'^seasily bypassed: ' ' 1 : ■ ■ . 
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Both secure coprocessors- and secure boot floppies can be fooled by a sufficiently 
faithful system emulation using a "double-entry" disk to circumvent integrity checks (see 
section 3.1.2). but secure coprocessors allow us to employ more powerful integrity check 
techniques. 

3.2. Audit Trails 

Audit trails must be kept secure to perform system accounting and provide data for intrusion 
detection. The availability of auditing and accounting logs cannot be guaranteed (since the 
entire machine, including the secure coprocessor, may be destroyed). The logs, however, 
can be made tamper evident. This is important for detecting intrusions. Experience shows 
that skilled attackers will attempt to. forge system logs to eliminate evidence of penetration 
(see [94] for an interesting case study). The privacy and integrity of the system accounting 
logs and audit trails can be guaranteed simply by holding them inside the secure coprocessor. 
However, it is awkward to have to keep all logs inside the secure coprocessor since they 
can grow very large and resources within the secure coprocessor are likely to be tight. 
Fortunately, it is also unnecessary. 

To provide secure logging, the secure coprocessor crypto-sedls the data against tamper- 
ing by using a cryptographic checksum function, before storing the data on the file system. 
The sealing operation must be performed within the secure coprocessor, since all keys used 
in this operation must be kept secret. By later verifying these cryptographic checksums 
we make tampering of log data evident, since the probability that an attacker can forge 
logging data to match the original data's checksums is astronomically low. This technique 
reduces the secure coprocessor storage requirement to memory sufficient to store necessary 
cryptographic keys and checksums, typically several words per page of logged memory. If 
the space requirement for the keys and checksums is still too large, they can be similarly 
written out to secondary storage after being encrypted and checksumrhed by master keys. 

Additional cryptographic techniques can prbtect the logs^ Cryptographic checksums 
provide the basic tamper detection and are sufficient if only integrity is heeded. If account- 
ing and auditing logs may contain sensitive information, privacy can be provided using 
encryption. If redundancy is also ; desired, techniques such as secure quorum consensus 
[39] and secret sharing [86] may be used to' distribute the data over the network to several 
machines without greatly expanding the space requirements. 

3.3. Copyprotection 

Software is often charged on a per-CPU, per-site, or per-use basis. Software licenses usually 
prohibit making copies for use on unlicensed machines. This injunction" against copying 
is technically unenforceable without a secure co^riocessor If the user cari execute dode 
on a physically accessible workstatipn, the yser can, aljso read that code. .Even if attackers 
cannot read the workstation memory while^ it is ru^r^^ on 
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the assumption that the workstation was booted correctly — verifying this property, as 
discussed above, requires the use of a secure coprocessor. 

3.3.1. Copy Protection with Secure Coprocessors 

Secure coprocessors can protect executables from being copied and illegally used. The 
proprietary code to be protected — or at least some critical portion of it — is distributed 
and stored in encrypted form, so copying without the code decryption key is futile, 14 and 
this protected code runs only inside the secure coprocessor. Either public key or private 
key cryptography may be used to encrypt protected software. If private key cryptography 
is used, key management is still handled by public key cryptography. In particular, when a 
user pays for the use of a program, he sends the certificate of his secure coprocessor public 
key to the software vendor. This certificate is digitally signed by a key management center 
and is prima facie evidence that the public key is valid. The corresponding private key is 
stored only within the secure non-volatile memory of the secure coprocessor; thus, only the 
secure coprocessor will have full access to the proprietary software. Figure 3.1 diagrams 
this process. 

What if the code size is larger than the memory capacity of the secure coprocessor? 
We have two alternatives: we can crypto-page or we can split the code into protected and 
unprotected segments. 

Section 4.1.3 discusses crypto-paging in greater detail, but the basic idea is to encrypt 
and decrypt virtual memory contents as they are copied between secure memory and external 
storage. When we run out of memory space on the coprocessor, we encrypt the data before 
it is flushed to unsecure external storage, maintaining privacy. Since good encryption chips 
are fast, we can encrypt and decrypt on the fly with little performance penalty. 

Splitting the code is an alternative to crypto-paging. We can divide the code into a 
security-critical section and an unprotected section. The security-critical section is en- 
crypted and runs only on the secure coprocessor. The unprotected section runs concurrently 
on the host. An adversary can copy the unprotected section, but if the divisipn is done 
well, he or she will not be able to run the cade wjthoutthe secure portion. In /xABYSS 
[104], White and Comerford show how such a partitioning should be dqne to maximize the 
difficulty of reverse engineering the secure portion of the application. 1 5 f 



14 Allowing the encrypted form of the code to be copied means that we can back up the workstation against 
disk failures. Even giving attackers access to the backup tapes will not release any of the proprietary code. 
(Note that our encryption function should be resistant to known-plaintext attacks, since executable binaries 
typically have standardized formats.) A more interesting question arises if the secure coprocessor may fail. 
Section 6.4 discusses this further. 

I5 I also examined a real application, gnu-emacs 1 9.22 [92], to show how it could be partitioned to run partially 
within a secure coprocessor. The X Windows display code and the basic key-press main loop should remain 
within the host for performance. Most of the emacs lisp interpreter (e.g., bytecode . c, callint . c, 
eval . c, lread . c, marker . c, etc) could be moved into the secure coprocessor and accessed as remote 
procedures. Any manipulation of host-side data — text buffer manipulation, lisp object traversal — required 
during remote procedure calls can be provided by a simple read-write interface (with caching) between the 
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/ Potentially Unsecure Host' 




Figure 3.1 Copy-Protected Software Distribution 

The softwiare retailer encrypts the copy-protected software with a random key. This key 
is encrypted using the public key of the secure coprocessor within the destination host, so 
only the secure coprocessor 1 may decrypt and run the copy-protected software. The software 
retailer knows that the public key of the secure coprocessor is good, because it is digitally 
signed with the public key of thb secure copfocessbr distributor. 
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Whether the proprietary code is split or not. the secure coprocessor runs a small security 
kernel. It provides the basic support necessary to communicate' with the host or the host's 
I/O devices. With separate address spaces and a few communication primitives, the com- 
plexity of a security kernel can be kept low, providing greater assurance that a particular 
implementation is correct. 

3.3.2. Previous Work 

A more primitive version of the copy protection application for secure coprocessors ap- 
peared in [46, 1 04]; a secure-CPU approach using oblivious memory references (i.e., appar- 
ently random patterns of memory accesses) giving a poly-logarithmic slow down, appears 
in [29] and [61]. The secure coprocessor approach" improves on these approaches by en- 
abling the protection of large applications, permitting fault-tolerant operation (see section 
6.4), and when coupled with the electronic currency application described in section 3.4, 
allowing novel methods of charging for use. 

3.4. Electronic Currency 

I have shown how to keep licensed proprietary software encrypted and allow only execute 
access. A natural application is to allow charging on a pay-per-use or metered basis. In 
addition to controlling access, to the software, according to the terms of a license, some 
mechanism must perform cost accounting, whether it tracks the number of times a program 
has run or tracks dollars in a user's account. More generally, this accounting software 
provides an electronic currency abstraction. Correctly implementing electronic currency 
requires that account dataL be protected against tampering — if we cannot guarantee integrity, 
attackers might be able to create electronic money ,at will. Privacy, while perhaps less 
important here, is a property that users expect for their bank balance and wallet contents; 
similarly, electronic money account balances should also be private, 

3.4.1. Electronic Money Models 

Several models can be adopted for handling electronic funds. Any implementation of these 
models should follow the standard transactional model; i.e., to group together operations in 
a transaction having these three properties [33, 34]: 



coprocessor and the host;, with' interpreter-private data such as catch/throw frames residing entirely within the 
secure, coprocessor. ; Garbage ; collection {does, become a problem, since tbe ; garbage collector must be able' to 
determine.if a Lisp.object is accessible frprn trje. call stack, a portion of which is inside the coprocessor. If we : 
cho?e to hide the actions of the evaluator and keep the stack within the secure coprocessor hidden, this would 
require that the garbage 'collector code (Fgar bage _cbi l e c t ahct its utilities) be moved within the secure 
coprocessor as well. ; 
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1. Failure aromiciw. If a .transaction's work is interrupted by a failure, any partially 
completed results will be undone. 

2. Permanence. Jf a transaction completes successfully, the result of its work will never 
be lost, except due to a catastrophic failure. 

3. Serializability. Concurrent transactions may occur, but the results must be the same 
as if they executed serially. This means that temporary inconsistencies that occur- 
inside a transaction are never visible to other transactions' 

These transactional properties are requirements for the safe operation of any database, and 
they are absolutely necessary for any electronic money system. 

In the following, I discuss various electronic money models, their security properties, 
and how they can be implemented using present day technology. (I have built an electronic 
currency system on top of Dyad.) 

The first electronic money model is based on the cash analogy. In this mode, electronic 
cash has similar properties to cash: 

1 . Exchanges of cash can be effectively anonymous. 

2. Cash cannot be created or destroyed except by national treasuries. 

3. Cash transfers require no online central authority. 

(Note that these properties are actually stronger than that provided by real currency — serial 
numbers can be recorded to trace transactions. Similarly, currency can be destroyed.) 

The second electronic money model is based on the credit- cards/checks analogy: ' 
Electronic funds are not transferred directly; rather, promises of payment; cryptographically 
signed to prove authenticity, are transferred instead. A straightforward implementation of 
the credit card model fails to exhibit any of the three properties above. However, by apply- 
ing cryptographic techniques, anonymity can be achieved in a cashier's check-like scheme 
(e.g., Chaum's DigiCash model [16], which lacks transactional properties such as failure 
atomicity — see section 3.4.2), but the latter two requirements (conservation of cash and 
no online central authority) remain insurmountable. Electronic checks must be signed and 
validated at central authorities (banks), and checks/credit payments en route "create" tem- 
porary money. Furthermore, potential reuse of cryptographically signed checks requires 
that the recipient- must be able to validate the s check with the central authority prior to 
committing to a transaction. _ .. 

The third electronic money model is based on the bank rendezvous analogy. This 
model uses a centralized authority to authenticate all transactions and is poorly suited to 
large distributed, applications. The bank is the sole .arbiter of account balance information 
and ean implement the access controls needed to ensure- privacy and integrity of the data. 
Electronic Funds Transfer (EFT) services use this hiddel ^ thereof e ho access restrictions 
on deposits into accounts, so only, the person who. controls ffie so account heeds .to be 
authenticated. " . 
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I examine these models one by one. 

With electronic currency, integrity of accounting data is crucial. We can establish a 
secure communication channel between two secure coprocessors by using a key exchange 
cryptographic protocol (see section 5) and thus use cryptography to maintain privacy when 
transferring funds. To ensure that electronic money is conserved (neither created nor 
destroyed), the transfer of funds should be failure atomic, i:e., the transaction must terminate 
in such a way as to either fail completely or fully succeed — - transfer transactions cannot 
terminate with the source balance decremented without having incremented the destination 
balance or vice versa. By running a transaction protocol such as two-phase commit [11, 
22, 106]- on top of the secure channel, secure coprocessors can transfer electronic funds 
from one account to another in a safe manner, providing privacy and ensuring that money 
is conserved. Most transaction protocols need stable storage for transaction logging to 
enable the system to roll back when a transaction aborts. On large transaction systems this 
typically has meant mirrored disks with uninterruptible power supplies. With the simple 
transactions needed for electronic currency, the per-transaction log typically is not that large, 
and the log can be truncated after transactions commit and further communications show 
all relevant parties have acknowledged the transaction. Because each secure coprocessor 
handles only a few users, small amounts of stable storage can satisfy logging needs. Because 
secure coprocessors have secure non-volatile memory, we only need to reserve some of this 
memory for logging. The log, accounting data, and controlling code are all protected from 
modification by the secure coprocessor, so account data are, safe .from all attacks; their only 
threats are bugs and catastrophic failures. :Of course, the system should be designed so that 
users should have little or ncy incentive to destroy secure coprocessors that they can access. 
This is natural whep one's own balances are stored on a secure coprocessor, much like the 
cash in one's wallets. 

If the secure coprocessor Jias insufficient memory to hold account data for all the users, 
the code and accounting database; may be written to host memory or disk after obtaining 1 
a cryptographic checksum (see . discussion of crypto-sealing in section 4. 1 .3). For the 
accounting 5 data, encryption may alternatively be employed since privacy is usually also 
desired. 

Note that this type of decentralized electronic currency is. not appropriate for smartcards 
unless they can be made physically secure from, attacks by their owners. f Smartcards 
are only .quasi-physically secure in that their privacy guarantees stem, solely from their 
portability. Secrets may be stored within smartcards because theirj users can, pro vide the r 
physical security necessary. Ivlalicious^users, however, can violate smartcard integrity and 
insert false data. 16 , - .. . : , . 

; Secure coprocessor mediated electronic currency transfer i§ analogous to rights transfer 
(not to be corrfused jwith ^rights: ^ Copying): in a .capability-based protection system [107]. 

I6 Newer smartcards' such as GET^liis 'br'Mondex cards '[35] feature limited ptiy si caT security protection^ 
though the typ J ; : 1 ,: 
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Using the electronic money — e.g.. spending it when running a pay-per-use program — is 
analogous to the revocation of a capability. 

What about the other models for handling electronic funds? With. the credit card/check 
analogy, the authenticity of the promise of payment must be established. When the computer 
cannot keep secrets for users, there can >be no authentication because nothing uniquely 
identifies users. Even if we assume .that users can enter their passwords into a workstation 
without fear of their password being compromised, we are still faced with the problem of 
providing privacy and integrity guarantees for network communication. We have similar 
problems as in host-tOrhost authentication in that cryptographic* keys need. to be somehow . 
exchanged. If, communications are in plaintext, attackers may simply record a transfer of a 
promise of payment and replay it to temporarily create cash. While security systems such 
as Kerberos [93], if properly implemented [6], can help to authenticate entities and create 
session keys, they use a centralized server and have problems similar to; those in-uhejbank 
rendezvous model. While we, can implement the credit card/check model using secure 
coprocessors, the inherent weaknesses of this model keep us from taking full advantage 
of the security properties provided by secure; coprocessors; if we use the ; full power ;of 
the secure coprocessor model to properly authenticate users and verify their ability to pay 
(perhaps by locking funds into escrow,), the resulting system would be equivalent to the 
cash model. - . ... -. 

With the bank rendezvous model, a "bank'' server supervises the transfer of funds. While 
it is easy to enforce the access controls on account data, this suffers from problems with — 
non-scalability, loss of anonymity, and easy denial of service from excessi ve centralization. 

Because every transaction must contact the bank server, access to the bank service will ■> 
be a performance bottleneck: Banks do not scale well to large user bases. ' When a bank " 
system grows from a single computer to several machines, distributed transaction systems 
techniques must be brought to bear in any case, so this model has no real advantage over 1 
the use of secure coprocessors in ease of implementation. Furthermore, if a bank's host 
becomes inaccessible, either maliciously or as a result of normal hardware failures, no agent 
can make use of any bank transfers. This model does not exhibit graceful degradation with 
system failures. 

The model of electronic cash niariaged on a secure coprocessor not £>rily can provide 
the properties of (1) anonymity; (2) conservation, and (3) decentralization, but it also 
degrades gracefully when secure coprocessors fail. Note that secure coprocessor data 
may be saved onto disk and backed u^p 1 after being properly encrypted, arid so even the 
immediately aifected users of a failed secure coprocessor should be able to recover their 
balances. The security administrators who initialize the secure coprocessor software will 
presumably have access to the decryption keys fo* this purpose. Careful procedural security 
must'be used here, both for protection of the decryption key arid for checking few double 
spending, since dishonest users might attempt to back up their secure coprocessor data, 
spend electronic money, and then intentionally destroy their coprocessor in the hopes of 
using their electronic currency twice. . Fortunately, bye us^ing. multiple secure, coprocessors 
(see section 6.4), full secure fault tolerance-^ 
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the frequency of backups depend on the reliability guarantees desired; in reliable systems, 
secure coprocessors may continually run self-tests when idle and warn of impending failures 
(in addition to periodic maintenance checkups and replication). Section 6.3 discusses how 
such self-tests may be done while retaining all security properties. 

The trusted electronic currency manager running in the secure coprocessor uses dis- 
tributed transactions to transfer money and other electronic tokens. Transaction messages 
are encrypted by the secure coprocessor's basic communication layer, providing privacy 
and integrity of communications. Traffic analysis is beyond the scope of this work and is 
not addressed. 

Electronic tokens are created and destroyed by a few trusted programs. For pay-per-use 
applications, the token is created by the vendor's sales program and destroyed by executing 
the application - — the exact time of destruction of the token is a vendor design decision, 
sincqruns of application programs are not, in. general, transactional in nature. 

Because certain privileged applications may create or destroy tokens, each token type 
has a pair of access control lists for token creation and token destruction. These access 
control lists may contain zero-knowledge authentication identities [36] or application IDs: 
trusted applications may run oh "physically secure hardware (e.g., in a guarded machine 
room), or in a secure coprocessor. In the former case, they should have access to the cor- 
responding zero-knowledge autheriticators and should be able to establish a secure channel 
With other electronic currency servers to create and destroy tokens; in the latter case, the 
program runs (partially) in a secure coprocessor, and its program text is protected from 
modification. 5 

Zero-knowledge authenticators ^section 5.1.4) running in the secure coprocessor per- 
mit the use of more powerful server machines, sidestepping lirhits (e.g., communication 
bandwidth or CPU speeds) imposed on secure coprocessor design by the need for secure 
packaging. These server machines must be deployed within a physically secure facility 
and special methods must be used to ensure security [101]. Server machines installed in 
a secure facility, could be secure as a normal secure coprocessor; however, they need not 
run the secure coprocessor kernel, nor would they have access to all secret keys normally 
installed into a secure coprocessor. 

3.4.2. Previous Work 

An alternative; to the secure coprocessor managed electronic currency is Chaum's DigiCash 
protocol [12, 16]. In such systems, anonymity is paramount, and cryptographic techniques 
are used to preserve the secrecy of the users' identities. No physically secure hardware is 
used, except in the observers refinement to prevent double spending of electronic money 
(rather than "detecting it after the fact).'" 



17 The observers rnodel ; employs a phys^ally sepure hardware, module to detect and prevent double spending. . 
Chaum 's protocol limits information flow Jo the. observer, so that the user need not trust it to maintain privacy; 
however, it must be trusted to riot destroy money.' Secure coprocessors achieve the same goals with greater 
flexibility; -- ' :- ! • -•n-; .^:5:::;-,/ ^fi.*»..j - /■•■* ; - -\ *t ' . 
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Chaum-style electronic currency schemes are characterized by two key protocols. The 
first is a blind signature protocol between a user and a central bank. During a withdrawal, 
the user obtains a cryptographically signed check that is probabilistically proven to contain 
an encoding of the user's identity. The user keeps the values used in constructing the check 
secret; they are used later in the spending protocol. ■ , 

The second protocol is a randomized interactive protocol between a user and a merchant. 
The user sends the blind-signed check to the merchant and interactively proves that the check 
was constructed appropriately out of the secret values arid reveals some, but not all, of those 
secrets. The merchant "deposits" to the central bank the blind-signed number arid the 
protocol log as proof of payment." This interactive spending protocol has a flavor similar 
to zero-knowledge protocols in that the answers to the merchant's queries, if answered : 
for both values of the random coin flips, reveal the user's identity. When double spending 
occurs, the central bank gets two logs for the same cheek, and from'this identifies the double 
spender. 

There are a number of problems with, this, approach. First, any system that provides 
complete anonymity is currently illegal in the United States, since any monetary transfer 
exceeding $10,000 must be reported to the government [i9], employee payments must be 
reported similarly for tax purposes [18], stock transfers must be reported to the Securities, 
and Exchange Commission, etc. Second, in a real intemetworked environment, network 
addresses are required to establish and maintain a communication channel, barring the 
use of trusted anonymous forwarders — and such forwarding agents are still subject to 
traffic analysis. Providing real anonymity in the high level protocol is useless without 
taking network realities into account. Third, Chaum's, cryptographic protocols do. not - 
handle failures, and any systems based on them cannot simultaneously have transactional 
properties and also maintain anonymity and security. A transaction abort in the blind 
signature protocol either leaves the user with a debited account and no electronic check or 
a free check. A transaction abort in the spending protocol either permits, the user to falsify 
electronic cash if the random coin flips are reused when the transaction is reattempted 
(e.g., the network partition heals), or reveals identifying information to the merchant if new 
random coin flips are generated when the transaction is reattempted. 

Clearly, to provide a realistic distributed electronic currency system, transactional 
properties must be provided. Unfortunately, the safety provided by transactions and the 
anonymity provided by cryptographic techniques appear to be inherently at odds with each 
other, and the tradeoffs made by Chaum-style electronic cash systems for anonymity instead 
of safety are'inappropriate for real systems. 

Another electronic money system is the Internet Billing Server [88]. This system 
implements the credit card model of electronic currency/ A central server acts as a credit 
provider for users who can place a spending limit on each authorized transaction, and it 
provides billing services to the service providers'; L Nb Wnott the dentral 

server has a complete record bi* every user 's purchases^d the records fojr the current billing 
period is sent to users as part of their bill. Some scaling may be achieved through replication, 
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but in this case providing hard credit limits require either distributed transactions, or every 
user must be assigned to a particular server, making the system non-fault tolerant. 

Other approaches include anonymous credit cards [52] or anonymous message for- 
warders to protect against traffic analysis, at the cost of adding centralized servers back to 
the system. 

3.5. Secure Postage 

While cryptographic methods have long been associated with mail (dating back to the use 
by Julius Caesar described in his book The Gallic Wars [15]), they have generally been used 
to protect the contents of a message, or in" rare cases, the address, on an envelope (protecting 
against traffic analysis). In this section, we examine the use of cryptographic techniques to 
protect the stamp on an envelope. 

The US Postal Service, with almost 40,000 autonomous post office facilities,, handles 
an aggregate total of over 165 billion pieces of mail annually [84]. Most mail is metered 
or printed: (Figure 3.2 shows an example of a postage meter indicia.) Traditional postage 
meters -must be presented to a branch post office to be loaded with postage. The postage 
credit is stored in a. register sealed in the machine. As each letter is stamped, the amount is 
deducted from the machine's credit register. Postal meters are subject to at least four types 
of attack: (1) the postage meter recorded credit may be tampered with, allowing the user 
to steal postage; (2) the postage meter stamp may be forged or copied; (3) a valid postage 
meter may be used by 1 an unauthorized person; and (4) a postage meter may be stolen. 18 

With modern facilities for barcoding machine readable digital information, it -would be 
easy to replace old-fashioned human readable indicia by indicia which are either entirely or 
partially machine readable. These indicia could encode a digitally signed message which 




, Figure 3.2 Postage Meter ; Indipia . . , ; ( | ' 
Today's metered Jetters have a simple. imprint th^t pan be:easily forged. 



18 



82,000 frankingrnachines iiiithe UV S: arelcurrently reported as lost or stolen [85]. 
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would guarantee authenticity. If this digital information included unique data about the 
letter (such as the date mailed, zip codes of the originator and recipient, etc.), the digitally 
signed stamp could protect against forged or copied stamps. A rough outline of how such 
a system might work was detailed by Pastor [63], 

Unfortunately, a digitally signed stamp may be vulnerable to additional types of attack: 

1. If cryptographic systems are misused, the system may be directly attacked. 

2. Even if cryptographic techniques are used correctly, if the adversary has physical 
. access to the postage meter, he may be able to tamper with the .credit register. 

3. Even -if the credit is tamper-proof, a postage meter may be opened and examined 
to discover cryptographic keys, allowing the adversary to build new bogus postage 
meters. 

4. ; The protection scheme may depend on a highly available network connecting post 

office facilities in a large distributed database. Since 40,000 autonomous post office 
facilities exist, such a network v/ould suffer from frequent failures and partitions, cre- 
ating windows of vulnerability (with 165 billion pieces of maileach year, a database 
to check the validity of digitally signed metered stamps appears infeasible.) r 

I outline a protocol for protecting electronic meter stamps, and demonstrate that the use,: 
of a secure coprocessor can address all of the above concerns. With the use of cryptography ■■■■ 
and secure coprocessors, both postage meters and their indicia can be made fully secure 
and tamper-proof. ^ 

3.5.1. Cryptographic Stamps 

A cryptographic postage stamp is an indicia that can demonstrate to the postal authorities 
that postage has been paid. Unlike the usual stamps purchased at a post office, these are 
printed by a conventional output device, such as a laser printer, directly onto an envelope 
or a package. Because such printed indicia can be copied,: cryptographic and procedural 
techniques must be employed to minimize jthe probability of forgery. 

We use cryptography to provide a crucial property: the stamp depends on the address. 
A malicious user may copy a cfyptograpriic stamp, but any attempts to modify it or the 
envelope address will be detected. To achieve this goal, we encrypt (or cryptographically 
checksum) as part of the stamp information relevant to the delivery of the particular piece 
of mail — e.g., the return address and the destination address, the postage amount, and 
class of mail, etc, as well as other identifying information, such as the serial number of 
the postage meter, a serial number for the stamp, and the date/time (a timestamp). The 
information, including the cryptographic signature or 1 checksum, is put into a' barcode.' The 
barcode must be easily printable by commodity ^fte9-rriarket l^sef ^rihters;' if must be 
easily scanned and re-digitized at a post office, and it must have sufficient information 
density to encode all the bitS' of the< stamp on the envelope -within. a reasonable amount of 
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space. Appropriate technologies include Code49 [62], Codel6K [43], and PDF417 [42, 65, 
66]. Symbol Technologies' PDF417, in particular, is capable of encoding at a density of 
400 bytes per square inch, which is sufficient for the size of cryptographic stamps needed 
to provide the necessary security in the foreseeable future. Figure 33 shows the amount of 
information that can be encoded. 

Six lines of 40 full ASCII characters for each address, four bytes each for hierarchical 
authorization number, the postage meter serial number, the' stamp sequence number, the 
postage/class, and the time, totals to under 500 bytes of data/ (Using PDF417, 500 bytes 
takes 1 .24 square inches.) : t 

The cryptographic signature within the indicia prevents many forms of replay attacks. 
Malicious users will not find it useful to copy the stamps, since the cryptographic signature 
prevents them from modifying the stamp to change the destination addresses, etc, so the 
copied stamps may only be used to send more mail to the same destination address: If 
duplicate detection is used (see below) then even this threat vanishes. The timestamps and 
serial numbers also limit the scope of the attack by restricting the lifetime of copies and 
permitting law enforcement to trace the source of the attack. 

Because cryptographic stamps also includes source information, the postage meter serial 
number, and the return address, duplicated stamps can also be detected in a distributed 
manner. Replays are detected by logging recent, unexpired indicia from processed mail. 
If the post office finds a piece of mail with a duplicate stamp, they will know that some 
form of forgery has occurred. will examine the practicality bf replay detection later in 
section 3.5.2. 1 -• ' ' 

While databases at regional offices can deter replay attacks, we need some way to 
protect the cryptographic keys within the postage meters as well — attackers who gain 
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access to the keys can use them to fraudulently sign cryptographic stamps. To prevent 
malicious users from accessing cryptographic keys requires physically protected memory 
and secure processing of the cryptographic keys. (If a machine does not perform secret 
computations using cryptographic keys, an adversary can place logic analyzer probes to 
observe address/data buses and obtain key values. Alternatively, the adversary may replace 
the memory subsystem with dual ported memory, and just read- the, keys as. they are used.) 
Even password protected, physically secure memory (such as that that provided by some 
dongles used with PC. software) is insufficient — the software must. contain the passwords 
required to access that protected memory, and if attackers don't know how to disassemble, 
the software to obtain the passwords, they can read it off of the wires of the parallel port as 
the software sends the passwords to enable access. 5 , ., • ■ 

Private processing of cryptographic keys is a necessary , condition for cryptography. 
Not only is this a necessary requirement to run real cryptographic protocols, it is also a 
necessary requirement for keeping track of the credit , amount remaining in a electronic 
postage meter register. Protected .computation is also required to establish secure chan- 
nels of communication for remote . (telephone or network) credit update — the electronic 
postage meter must communicate with the post office when the user buys more postage, 
and cryptographic protocols must be run over the communication lines to prevent foul play. 
Secure communication channels require cryptography, and we need a, safe place to keep 
cryptographic keys and to,, perform secure computation. . 

To achieve private, tamper-proof computation, a processor, with secure non-volatile; 
memory for key storage, and perhaps some normal RAM as scratch space (to hold interme- : 
diates in the calculations) must also be made physically secure. These properties are easily 
provided by secure coprocessors. . 

3.5.2. Software Postage Meters 

By using secure coprocessors in a PC-:based system, we can build secure postage meter 
software. A PC-based electronic postage meter system would include a secure coprocessor, 
a PC (the coprocessor host), /aTaser/ printer, a; modem, and optionally an optical character 
recognition (OCR) scanner ;knd/or a network interface. Like ordinary postage meters, our 
PC-based postage meter system,Wpuld operate in an office environment as a shared resource, 
much like laser printers. ; , 

The basic idea is simple: the software obtains the destination and return addresses and 
the weight and delivery class from the user — either, direqtly from the word processor 
running on the user's PC 19 , by reading directly frorii the envelope and using OCR software, 
or by direct keyboard input ^ and, requests; ,a cryjptographic!: stamp from the secure co- 
processor. The secure coprd^siso^^^ and generates a digitally 
signed message containing jthe;yklue $f information, the 



I9 The word processing software can even provide good weight estimates since it knows the number of pages 
in the letter. c \» * '. ; t - ;■ V. v i; 1 f". ; A -J;- T !0 \7<(\H:\y*nY... \ [' V'J f1 ■ J'-.i\"' : 



31. 



date, the ID of the secure coprocessor, and other serial numbers. This message (a bit vector) 
is sent to the PC. which encodes it and prints a machine readable indicia on the laser printer 
Advanced 2-D bar coding technology such as PDF417 mentioned in section 3.5.1 may be 
employed. 

Postage Meter Currency Model ; ; 

Postage credits held within an electronic postage meter are simpler than general electronic 
currency because of their restricted usage. Postage credits must be purchased from a post 
office,' and credits can only buy one type of item: cryptographic stamps (or be transferred 
to another electronic postage meter). 

We can take advantage of these restrictions to the currency model to achieve solutions 
simpler than those considered in section 3.4. Furthermore, because pieces of mail produced 
by a particular secure coprocessor are likely to be mailed in the same locality, the replay 
detection can be done with much lower overhead than otherwise, as described below. 

Reloading a Meter 

Only post offices may reload postage meters. Unlike their older mechanical brethren, 
electronic postage meter equipment need not be carried to the local post office when the 
amount of credit inside runs low — : the local post office can simply provide a phone number 
to "recharge" electronic postage meters by modem, paying by credit card numbers or direct 
electronic funds transfer. The USPS meter authenticates the secure coprocessor and uploads 
funds. Meters' communications must be protected by cryptography; otherwise a malicious 
user may record the control signals used to update credit balances and replay that message. 
Encryption also protect businesses' credit card or EFT account numbers from being used 
by malicious eavesdroppers. , 

Detecting Replays 

With a kilobyte of data per stamp, it would seem at first that replay detection is infeasible 
because of size of the database required. However, we can exploit the distributed nature of 
mail delivery and sorting. 

The US Postal Service sorts mail twice. First, mail is sorted by destination zip code at a 
site near the source. Then, the mail is delivered (in large batches) to a site associated with 
the destination zip code, where the mail is again sorted, this time by carrier route. Every 
piece of mail destined for the same address passes through the same secondary sorting site, 
making it a natural place for detecting replays. 

Detecting replays locally is feasible with today's technology. Using the 1992 figures of 
1 65 billion pieces of mail per year handled at 600 regional sorting sites, with the simplifying 
assumption that the volume of rnaihis' evenly distributed among- these regional offices, w£ 
can obtain an estimate of the' storage Resources required. Assuming that cryptographic 
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stamps expire six months after printing. 20 an average regional office will see approximately 
130,000,000 stamps out. of a national total of 80,000,000,000 stamps. If we store one 
kilobyte of information per stamp (doubling the above estimate) and assume that the entire 
current mail volume uses cryptographic stamps, this would require only 130 gigabytes of 
disk storage per facility for logging, well within the capacity of a single disk array system. 
The stamps database can be viewed as a sparse boolean matrix indexed in one dimension 
by postage meter serial number and in the second dimension by stamp sequence number 
for that postage meter., Hashing this matrix into a 256 megabyte hashtable results in a 6% 
chance of collision. : 

To make replay detection even easier, we exploit the physical locality property: pieces 
of mail stamped by a single postage meter are likely to enter the mail processing system at 
the same primary sorting site. Therefore, cryptographic stamps from the same postage meter 
are very likely to be canceled at the same regional office, and we can detect replays there. 
If any cryptographically stamped piece of mail is sent from a different mail cancellation 
site, network connections can.be used for real-time remote access of cancellation databases, 
or batch processing media such as computer tapes may be used. In the case of real-time 
cancellation, the network bandwidth required depends on the probability of the occurrence 
of such multi-cancellation-site processing, and on how quickly we need to detect replays. 
The canceled stamps database at each regional office need not be large — each postage 
meter can simply write a counter value in its stamps. We need only fast access to a bit 
vector of recently used, unexpired stamp counter values. These bit vectors are indexed 
by the postage meter's serial number and can be compressed' by run-length encoding or 
other techniques. Only when a replay is detected might we need access to the full touting 
information. ' ! ; 

The average figure of 130,000,000 stamps tracked by a regional office can now be 
represented as a dense bit vector, since only local postage meters need to be tracked. A fast 
bit-vector representation would require 1300 megabits of storage plus indexing overheads, 
or just 17 megabytes plus overhead — an amount of storage that can easily fit into an 
average PC. While additional space may be required for indexing to improve throughput 
and for replicated stable storage, the amount of memory required is quite small. 



20 The U. S. Postal Service claims; to deliver more than 90%;of alhfirkt:class;rnail in 1 three days, and more than 
99% in seven days. Six months would appear to be a generous bound, for mail delivery. . : ■ -. ■ 
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Chapter 4 

System Architecture 

I have implemented Dyad, a prototype secure coprocessor system. The Dyad architecture 
is based on operational requirements arising from the security applications in chapter 3. 
However, the hardware modules on which Dyad is built present additional limitations on 
the actual implementation. This chapter starts off with Dyad's abstract system architecture 
based on the operational requirements of a security system during system initialization 
and during normal, steady state operation. Next, 1 detail the capabilities of our hardware 
platform, and describe the architecture of the actual implementation. 

4.1. Abstract System Architecture 

Chapter 3's security applications place requirements and constraints on system structure. 
From these application requirements I arrive at an operational view of how secure copro- 
cessor systems should be organized. " 

4.1 .1 . Operational Requirements 

I begin by examining how a secure coprocessor interacts with the host during system boot 
and then proceed with a description of system services that a secure coprocessor provide to 
the host operating system and user software. 

To be sure that a system is securely booted, the bootstrap process must involve secure 
hardware. Depending on the host hardware (e.g., whether a secure coprocessor could halt 
the boot process in case of an anomaly) we may need secure boot ROM. Either the system's 
address space is configured so the secure coprocessor provides the boot vector and the boot 
code directly; or the boot ROM is a piece of secure hardware. In either case, a secure 
coprocessor verifies system software (operating system kernel, system related user-level 
software) by checking the softwares' signatures against known values. To check that the 
version of the software present in external, unsecure, non-volatile store (disk) is the same 
as that installed by a trusted party. Note that this interaction has the same problems faced 
by two hosts communicating via a unsecure network: if an attacker can completely emulate 
the interaction that the secure coprocessor has with a normal host system, it is impossible 
for the secure coprocessor to detect this. With secure coprocessor/host interaction, we 
can make very few assumptions about the host (it can not keep cryptographic keys). The 
best that we can do is to assume that the cost of completely emulating the host at boot 
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time is prohibitively expensive. (Section 3.1 .2 discusses the theoretical limitations to this 
approach.) 

The secure coprocessor ensures that the. system securely boots; after booting, a secure 
coprocessor aids the host operating system by providing security functions. A secure 
coprocessor does not enforce the host system's security policy — this is the job of the host 
operating system. Since we know from the secure boot procedure that a. correct operating 
system is running, we may rely on the host to enforce policy. When the host system is up 
and running, a secure coprocessor provides various security services to the host operating 
system: , r . ■ ; . 

• integrity verification of any stored data (by secure checksums); 

• data encryption to boost storage media natural security (see section 2.4); and 

• encrypted communication channels (key exchange, authentication, private key en- 
cryption, etc). 21 

4.1 .2. Secure Coprocessor Architecture 

The boot procedure described above made assumptions about secure coprocessor capabili- 
ties. Let us refine the requirements for secure coprocessor software and hardware. 

To verify that the system software is the correct version, the secure coprocessor must 
have secure memory to store checksums or other data. If keyless cryptography check- 
sums such as MD5 [77], multi-round Snefru [56], or IBM's IvlDG [41] are one-way hash 
functions, then the only requirement is that the memory be protected .from unauthorized 
writes. Otherwise, we must use keyed cryptographic checksums such as Karp and Rabin's 
technique of fingerprinting (see [45] and section 5.L5). : The latter approach requires that 
memory also be protected against read access, since both the hash value and the key must 
be secret. Similarly, cryptographic operations such as authentication, key exchange, and 
secret key encryption.all require secrets to be kept. Thus a secure coprocessor must have 
memory inaccessible by all entities except the .secure coprocessor itself — enough pri- 
vate non-volatile memory to store the secrets, pliis private (possibly volatile) memory for 
intermediate calculations in running protocols. „ ; r 

How much private non- volatile and volatile scratch memory is enough? How fast .must 
the secure coprocessor be to have good performance with cryptographic algorithms? There 
are a number of architectural tradeoffs for a secure coprocessor, the, crucial : dimensions 
being processor speed and memory size. They together, determine jthe class of cryptographic 
algorithms that are practiqaLj . . , : ; : , : - > ,.. ^ . . _ t .... . . , { . r 



2 Presumably remote hosts will also contain a secure coprocessor, though everything; will work fine as t long 
as remote hosts follow the appropn'ate protocofe The final design must take into consideration "the possibility 
of remote 'hbsts 'wim^ r / ^ - ? ? , r - :. * 
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4.1.3. Crypto-paging and Sealing 

Ctypio-paging is another technique for trading offmemory for speed. A secure coprocessor 
encrypts its virtual memory contents before paging it out to the host's physical memory 
(and perhaps eventually to an external disk), ensuring privacy. We need only enough 
private memory for an encryption key'and a data cache, plus' enough memory to perform 
the encryption if no encryption hardware is present. To ensure integrity virtual memory 
contents may be crypto-sealed by computing cryptographic checksums prior to paging out 
and verifying them when paging in. ' 

Crypto-paging and sealing are analogous to paging of physical pages to virtual memory 
on disk, except for different cost coefficients. Well-known analysis techniques can be 
used to tune such a system [49, 108]. The cost variance will likely lead to new tradeoffs: 
computing cryptographic checksums is faster to calculate than encryption, so providing 
integrity alone is less expensive than providing privacy as well. On the other hand, if the 
computation can reside entirely on a secure coprocessor, both privacy and integrity can be 
provided for free. 

Crypto-paging is a special case of a more general speed/memory trade off for secure 
coprocessors. I observed in [97, 98] that Karp-Rabin fingerprinting can be sped up by 
about 25% on an IBM RT/APC with a 256-fold table-size increase; when implemented in 
assembler on an i386SX the speedup is greater (about 80%; see chapter 8). Intermediate- 
size tables yield intermediate speedups at a slightly higher increase in code size!" Similar 
tradeoffs can be found for software implementations of DES. ' 

4.1^4. Secure Coprocessor Software 

A small, simple security kernel is needed for the secure coprocessor. What makes Dyad's 
kernel different from other security kernels is the partitioned system structure. 

Like normal workstation (host) kernels, the secure coprocessor kernel must provide 
separate address space'if vendor and user code is to be loaded into the secure coprocessor 
— L even if we implicitly trust vendor and user code, ' providing separate address spaces 
helps isolate the effects of programming errors. Unlike the host's kernel, many services 
are not required: terminal, network, disk, and most other device drivers need not be part 
of the secure coprocessor. Indeed, since both the network arid disk drives are susceptible 
to tampering, requiring their drivers to reside in the secure coprocessor's kernel is overkill 
— network and file system services from sedure coprocessor tasks can be forwarded to 
the host kernel for processing. Normal operating system daemons such as printer service, *' 
electronic mail, etc. are entirely inappropriate in a secure coprocessor. r 1 : ; 

The only services that are crucial t6 the operation of the Secure 5 coprocessor are ( 1 ) secure 
coprocessor resource management; (2) communications; (3) key 'management; arid (4) 
encryption services. Resource management includes task allocation and scheduling, virtual 
memory allocation and paging, and allocation of communication ports. Communications 
include both communication among £eci^^ cqr^\j^caii6n]to host 

tasks; it is by communicating with host system tasks 4^ 
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Key management includes management of authentication secrets, cryptographic keys, and 
system fingerprints of executables and data. With the limited number of services needed, 
we can easily envision using a microkernel such as Mach 3.0 [31], the NT executive [20], 
or QNX [40]. We only need to add a communications server and include a key management 
service to manage secure non-volatile key memory. If the kernel is small, we have more 
confidence that it can be debugged and verified; (In Dyad,, we ported Mach 3.0 to run within 
the Citadel secure coprocessor.) . : 

4.1.5. Key Management 

Key management is a core portion of the secure coprocessor software. Authentication, key 
management, fingerprints, . and encryption protect the integrity of the secure coprocessor 
software and the secrecy of private data. The bootstrap loader, in ROM or in secure non- 
volatile memory, controls the bootstrap process of the secure .coprocessor itself. In the 
same way that the host-side bootstrapping process verifies the host-side kernel and system 
software, this loader verifies the secure coprocessor kernel before transferring control to it. 

• The system fingerprints needed for checking system integrity reside entirely in secure 
non-volatile memory or are protected by encryption while in externa] storage. (Decryption 
keys reside solely in secure non-volatile memory.) If the latter approach is chosen, new 
private keys must be selected for every new release of system software 22 to prevent replay 
attacks where old, buggy, secure coprocessor software is reintroduced into the system/ 
Depending on the algorithm, storage of the fingerprint information requires only integrity or 
both integrity and secrecy. For keyless cryptographic checksums (MD4, MDC, and Snefru), 
integrity is sufficient; for keyed cryptographic checksums (Karp-Rabin fingerprint), both, 
integrity and secrecy are required. • < , 

Other protected data held in secure non-volatile memory, include administrative au- 
thentication information needed to update the secure coprocessor software. We assume 
that a security administrator is authorized to upgrade secure coprocessor software. The 
authentication data for the administrator can be updated along with the rest of the secure 
coprocessor systemsoftware; in either case, the upgrade must appear transactional, that js, it 
must have the properties of permanence , where results of completed transactions are never 
lost; serializabilityi where there is a sequential, non-overlapping view of the transactions; 
and failure atomicity, where transactions either complete or fail such that any partial results 
are undone [2,6, 33, 34]. Non- volatile memory gives us permanence automatically; serializ- 
ability, while important for multi-threaded applications, can be enforced by permitting. only 
a single upgrade operation at a time (this is an infrequent operation and does not require 
concurrency); and the failure atomicity guarantee can be provided as long as the secure 
non-volatile memory subsystem provides an atomic store operation. Update transactions 
need not be distributed nor nested; this simplifies the implementation. 

22 One way is to use a cryptbgraphiGallyJsectir^ pseudo-random number generator [9; 1 0] with its internal state 
entirely in secure non-volatile memory. : : ; ■ - <• 
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4.2. Concrete System Architecture 



My Dyad prototype secure coprocessor system is realized from several system components. 
To a large extent, it satisfies the system hardware requirements induced by the abstract 
architecture discussed in the previous section. At 'the- 'highest level, the Dyad prototype is 
a host workstation with special modifications that allows it to talk' to a secure coprocessor, 
and the secure coprocessor itself. The prototype host system hardware is a IBM PS/2 Model 
80. The prototype secure coprocessor subsystem is a Citadel coprocessor board [105]. The 
secure coprocessor is attached to the PS/2's microchannel system bus via a Data Translation 
adapter card. The interfaces between these hardware components arid limitations of these 
components influence or constrain some aspects of the system software architecture. 

Both hardware subsytems run the CMU Mach 3.0 microkernel [31]: the host has special 
device drivers to support communication with the coprocessor through the Data Translation 
card, and the coprocessor kernel has special drivers and platform-specific assembly lan- 
guage interface code in addition to- the machine, independent code. On the host side there 
is additional software for providing interface support for the secure coprocessor. 

The remainder of this section describes the hardware, the host-side system software, 
the coprocessor-side system code, arid the application interface. . - 

4.2.1. System Hardware 

The PS/2 host contains an Intel i3 86 GPU running; at 16 MHz, 16 megabytes of RAM, > 
and a microchannel system bus. The Citadel coprocessor contains an; Intel i386SX CPU, 
also running at 16 MHz; one megabyte of "scratch" volatile RAM; 64 kilobytes of battery- 
backed secure RAM; bootstrap EPROM with a simple monitor program; and 64 kilobytes of " 
second-stage bootstrap EEPROM; an IBM produced DES chip with a theoretical throughput 
of 30 megabytes per second. 23 All of this is privacy-protected by intrusion detection 
hardware, i ' *.' • u . -' ' ' * 

Because the Citadel coprocessor is prototype hardware, it has not been integrated into 
a standard microchannel card. Instead, the coprocessor board is physically external to the 
host and is logically attached to the host's system bus via a Data Translation rhicrochannel 
interface card within the host. (See figure 4.1.) The D&ta Translation card contains the 
bus drivers, microchannel protocol chips, and bidirectional "command" port 1/6, plus some 
simple logic for generating host-side interrupts.. The microchannel chips handle arbitration 
for two independent DMA channels which simultaneous input arid output to the DES engine. 



23 The coprocessor board's design limits the maximum throughput to 16 Mbyte/sec — an external hardware 
state machine controls the DES chip's operation, and a separate 32 MHz crystal independently clocks this 
state machine. If the control software, used zero time,, this 16Mh>^te/sec figure would^repre&ent the maximum 
attainable encryption throughput for the Citadel board. .\- ; • r: ■ ■ / = s • "v. ■ ■ V . - ■ . \ • i ; \ = 
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Figure 4.1 Dyad Prototype Hardware 



DES Engine 

The DES engine on the coprocessor board includes input and output FIFO buffer chips 
for the DES chip I/O. Because the DES chip runs on a separate clock, these FIFOs permit 
fast, asynchronous data transfer with hardware interlocks. The data source and sink for the 
FIFOs may be programmed via multiplexors to be one of six sources/sinks: the host (via 
DMA transfers), the coprocessor, and an external bus interface. The external bus interface is 
unused in the present configuration; in the future, it may be connected to network interfaces 
or disk controllers. Furthermore, the DES engine can be configured to work in "cipher 
bypass" (CBP) mode, where data is routed around the DES chip. This permits the use 
of the DMA channels to transfer bulk data between the coprocessor and the host without 
encryption. Figure 4.2 shows the DES engine data paths. 

The Citadel DES engine's I/O multiplexors and the DES chip's encryption/decryption 
mode are configured via a control port accessible on the ; coproe$$sor bus. Wheathe host is 
the data source, the DES engine expects that the host has configured its DMA channel to 
transfer data to the input FIFOs,- and when the. coprocessor system bus is the data source, 
the coprocessor itself will write to the input FIFO via processor I/O, instructions. Similarly, 
when the host is the data sink, the DES engine expects the host will DMA-transfer data 
from the output FIFOs to its memory; when the coprocessor is the sink, processor I/O 
instructions are used to read out data from the output FIFO. 

The Data Translation card provides two 16-bit wide DMA channels, giving simultane- 
ous access to the input and output ends of the hardware DES engine, thus allowing the host 
to request host-memory to host-memory DES encryption/decryption operations. This form 
of "filter" I/O operation does not fit the usual Unix/Mach style read/write model; we will 
see below that the driver software handles this as a special case. 
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Figure 4.2 DES Engine Data Paths 
The DES engine runs asynchronously; the input and output FIFOs allow the data sources 
and sinks to move data quickly without needing to poll or spend too much time processing 
interrupts/ The cipher-bypass multiplexor allows the use of the buffers and DMA control 
logic circuits without engaging the DES chip, allowing dual usage of the DMA hardware. 
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Command Ports 

The Citadel-side Dyad kernel uses the DES data path for bulk communication as well as 
encryption; for lower bandwidth communication and controlling the DES engine data path, 
the kernel uses bidirectional command ports provided by the Data Translation card. 

The command ports are 16 bits wide and show up in the I/O address space of both the 
host and the coprocessor; status bits in a 1 separate status^ register show whether the current 
word is unread, and interrupts may be generated as a result of certain state transitions of 
the command ports. On the host side, individually maskable interrupts may -be generated 
whenever a new value is written to the host from the- coprocessor- telling the host to read 
that value; or whenever the previous value from the host to the coprocessor was read 
by the coprocessor, telling the host, that it may.: write the next.value. Unfortunately, on 
the coprocessor side only one interrupt for command port I/O exists — the coprocessor 
receives, a (maskable) interrupt when .a new value arrives on the port,. but is not informed . 
when it may send the next value to the host. As we will see in section 4.2.3, this causes 
some problems with performance in the Dyad kernel implementation. 

Hardware Limitations 

There are several Citadel hardware design limitations which degrade system performance. 
Becatise the command port does not generate an interrupt when data may be sent from the 
Citadel to the host, the command port throughput is lower than it would be otherwise,. The 
coprocessor kernel software polls a status register occasionally to send data to the host, and 
this polling frequency limits the bandwidth. Furthermore, the use of the command port 
is used to send control messages for setting up the DES/DMA data path for high speed 
transfers. This adds extra latency to these transfers. 

The data path between the IBM designed DES chip and the I/O FIFOs are 16-bit-wide 
words. Unfortunately, because of a design error in the DES chip, the 1 6-bit units within a 
block of ciphertext/cleartext must be provided to the chip in reverse order. To work around 
this in hardware, extra "byte flipper" latches are included to reverse words within blocks. 
There is very little penalty in terms of throughput; however, it does add extra latency to 
every transfer, since 6 extra bytes of data must be written to the input FIFOs in order tp 
flush out the previous block of data. This cannot always be done by extending the sizes of 
the transfers, since overflowing input buffers in an DMA transfer causes system faults when 
transferring at the end of physical memory. For DMA transfers, since the DMA controller 
needs to be reset/released at the end of the transfer in any case, the extra 6 bytes are written 
via software at the DMA completion interrupt. i • .. i ■ >; ^ . 

The DMA completion interrupt occurs when the DMA controller transfers the requested 
number 6f words. Unfbrmhately, : the DMA controller cannot always b& reset sit this time, 
since for host-to^coprocessor transfers' this' means only that the' iripuf FIFO to the DES 
engine is full ■ and resetting the' DNi^^britrbll^r at this 'point would confuse the DES engine/ J 
Similarly, the coprocessor must hutializ:e its DES engine before the Host Can program the 
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DMA controller. Both the DES-engine-completion event and the DLS-engine-initialization 
event cause a status bit to change, and the host must poll the status register to detect this. 

Another design flaw in the J BM DES chip causes alternate decry ptions to output garbage. 
The driver software compensates by performing a dummy decryption of a small, internal 
buffer after every normal decryption. This imposes extra latency overhead. Some of the 
overhead of the dummy decryption is hidden from host-side applications by performing it 
after the real decryption, since the host-side DMA- transfer for the real decryption will com- 
plete by this point and the dummy decryption may overlap with host-side driver execution 
(releasing DMA channel etc). - , 

Yet another limitation is not a design flaw per se: the Data Translation card interface 
does not provide the coprocessor »with the ability to become a "bus master" on the host's 
system bus — i.e., the coprocessor may not take over the microchannel, driving the address 
lines and read/write memory. Furthermore, the system bus interface provided by this 
card does not provide ABIOS device boot ROM space, which contains code that the host 
processor runs at host boot-up. Because the coprocessor cannot control the host system 
to perform host integrity checks, this prohibits the prototype system's coprocessor from 
performing secure bootstrap of the host and from periodically checking the behavioral of 
the host system. This should be repaired in a revised version of the board. 

These hardware idiosyncrasies force some extra complexity in coprocessor kernel soft- 
ware, and make it impossible (currently) to implement secure bootstrapping of the host. 
Fortunately, most of this extra complexity only imposes a slight overall performance degra- 
dation in the system software, though, the DMA transfer rates are much lower than they 
could be otherwise. 

4.2.2. Host Kernel 

The system software on the host contains only one Dyad-specific module: the driver 
needed to use the Data Translation card to talk to the Citadel board. The host kernel driver 
is separated into two parts: twblow-level drivers in the Mach microkernel and a higher- 
level driver in the Unix server. The low-lbvel drivers handle interrupts and simple device 
data transfers to the command port and the DMA channels; the high-level driver provides 
ah integrated view of the coprocessor as a Unix device, emulating an older driver that I 
wrote for the' Mach 2.5 integrated kernel. Figure 4.3 shows the structure of the host-side 
system. : ' - ... 

Microkernel Drivers - > ; . 7 ^ j 

The microkernel drivers, handle Jowrieyeicf^ta trapsfer 3 ,with,separ^ for the com- 

mand port and DMA. The Qommarid.pprt l/Ojs every 16-bit 

wofd being transferred is usually acco^p^ied^by ajn iri^^pt. r The DM^ I/O is viewed as. 
a block transfer device,. since l^ge .chupks^of : . ,t 
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Figure 4.3 ; Host Software Architecture < 



Command Port The two microkernel drivers are not independent — the command port 
driver provides hooks for the DMA driver's internal use; the command port is used by 
the DMA driver to synchronize DMA transfers with the coprocessor. When the DMA 
driver needs to use the command port, it checks that all other pending data queued for 
the coprocessor has been sent prior to taking it over, If, new I/O. requests arrive while the . 
command port driver is being used by the DMA driver, they are enqueued separately in the 
command port driver until the DMA driver is done, so that any messages sent by the DMA 
driver will not be interrupted. , 

Because of the interactions between the DMA driver and the command port driver, com- 
mand port device operations must guarantee that a successful return from the device remote 
procedure call really means that th$ data was transferred. Unlike serial line drivers currently 
in Mach 3.0, my port I/O driver does not simply enqueue data from device. write ( ) 
requests into a circular buffer, return immediately with p_ SUCCESS, and send the data 
later; rather, it keeps the write requests in queues and generates a reply n^essage only after, 
the data has been actually sent to the coprocessor. Similarly, data from the coprocessor are 
read only if there have hpen device, read ( ) requests have been enqueued, and when 
the DMA driver takes over the command port, the DMA driver may jump this read queue 
to obtain replies. Typically, the Unix server will have only one device, read ( ) request 
pending at any given time for the commaLrtd port ; 

DMA' -driver' The microkernel' DlVlA driver translates device I/O requests into DMA 
transfers to or frbni' the coprocessor t)ES engine. The DES hardware is integral to every 
DMA transfer, arid must be ^ pro^Siim6d by the coprocessor with the appropriate transfer ' 
count, DES chip mode, and data source or sink; 1 Prior to a transfer, the DMA driver uses the 
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command port to inform the coprocessor of the size and type of the transfer. The encryption 
key and initial vector used for DES operations are also set by the coprocessor, and it is 
assumed that the host processor has arranged with the coprocessor to use the appropriate 
key and initialization vector. 

Associated with the driver is state which is set by calling device, set _ status ( ) . 
This state determines whether the driver should be operating in "filter" mode and whether 
the driver expects that the DES engine within the coprocessor will perform an encryption 
DMA transfer or a decryption transfer. This driver state must be consistent with the state 
of the coprocessor kernel. 

The type of DMA transfer with the DES engine depends on whether or not a DMA 
driver read/write operation is in filter mode. The DES engine's I/O FIFOs may be configured 
to both source and sink data via the DMA transfers. In non-filter mode the driver simply 
translates device, write ( ) operations into DMA transfers from the supplied data buffer 
to the DES engine's input FIFO, with the coprocessor bus as the data sink. Similarly, 
device, read ( ) operations are translated into DMA transfers from the DES engine's 
output FIFO, with the coprocessor writing to the DES engine's input. 

When the DMA driver is in filter mode, device _ write ( ) and device. read () 
operations must come in identically-sized pairs. The DMA driver assumes that the copro- 
cessor will program the DES engine's input and output FIFOs to use DMA transfers to or 
from the host, and on a device .write ( ) operation the driver will internally allocate 
an I/O buffer to hold the filter result. These I/O buffers are then enqueued in the driver," 
with the data being transferred into the result of the matching devi ce _ read ( ) operation 
when if comes along. 

Whether a DMA transfer results in an encryption or a decryption "by the DES engine is 
important to the DMA driver because the DES engine performs encryption/decryption in 
Ciphertext Block Chaining (CBC) mode [57]: the previous felock of ciphertext is fed back 
into the DES chip as part of the encryption/decryption operation. Since out-of-line data in 
MachTPC messages are remapped pages of virtual memory, the DMA driver has no control 
over their location in the physical address space, and these pages are likely to be physically 
non-contiguous. Because DMA transfers operate on ivired-down physical buffers (virtual 
memory marked as nonpageable), the DMA driver typically cannot DMA-transfer more 
than one physical page at a time. This irriplies that the driver must check, on a new transfer, 
that the last block of ciphertext from the previous transfer is available to the coprocessor to 
maintain the CBC feedback. : ! 

While there are three transfer modes (encrypt, deciypt J1 crypto-bypass).and multiple data- 
sources and sinks (host or coprocessor), only some of these require special action. These 
are the cases where the host DMA driver has possession of the last block of cipher text (1) 
when the ; operation is a non-filter device read ( } a£i<} the D£S t mode is encryption* (2) 
when the operation is, a non^filter;dev ; ice ^w^iUei|i) .and ^he : DES mode, is decryption, 
and (3) the operation is an ; filter (paired, devi ( ),) and 

the DES mode is either encryption or decryption. ^ 0 r.. r m r 'rJ ^ - < » i r * A , ! 
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At first glance it may appear that it would be possible to simplify the problem of non- 
physically-contiguous pages by running multiple DM A transfers to the DES engine without 
informing the DES engine that the encryption (or decryption) transfer will be performed in 
pieces. Unfortunately, this doesn't work, since the DMA controller on the host relies on 
the peripheral to generate a DMA completion interrupt, and the Citadel interface generates 
DMA completion based on the DES engine's count register reaching zero. If we tried to 
program the DES engine with the full size of the I/O request but performed smaller, partial 
DMA transfers, the host would not know when a DMA transfer is completed except by 
polling the host-side DMA controller's transfer count register. 

Because user data buffers must be partitioned into DMA transfers of physically con- 
tiguous pages, the DMA driver also sends a DMA start control message (via the command 
port) to the coprocessor kernel specifying the size of each of the current transfer. This 
control message is sent after the DMA driver has initialized the host-side DMA controller; 
when the coprocessor receives a DMA start message, it then initializes the DES engine, 
which causes the DMA transfer to proceed. 

Unix Server Driver ; 

Like most Unix server drivers, the Unix server driver for communicating with the secure 
coprocessor is simpler than the microkernel driver. The Unix server driver for coprocessor 
communication provides a more integrated view of the coprocessor than do the microkernel- 
tevel drivers. .It achieves this by using both of the underlying Mach devices via standard 
Mach device' remote procedure call primitives. : : ' ' 

: Unix level system calls open ( ), close O'^ read ( ) / write f ) and ioctl () ' 
are translated by the Unix server driver into equivalent Mach driver device.open ( ) , 
device _ close ( ) , device .read ( ) ,deviceL write- ( ) , device, set _ status ( ) , 
and device.get _ status! ) remote procedure calls. The read() and write () sys- 
tem calls are translated into device, read ( )' and device, write ( ) to the DMA driver 
in the microkernel for bulk data transfer. The Unix server driver provide special ioctl ( ) 
requests to send short control messages via the command' port; these are translated to the 
appropriate device messages to the Mach-level coprocessor command port driver. These 
control messages are used to negotiate the type and contents of bulk DMA data transfers, 
for low level control operations with the EPROM boot monitor, and for emulation of a 
console device for the coprocessor kernel. 

Coprocessor Interface 

I wrote a user-level program, cit, to run on the host to download the coprocessor micro- 
kernel into the coprocessor, provide a simple emulated console display, and provide mass 
storage access for the coprocessor kernel once it boots up. The program uses the Unix 
driver to communicate via the command port to the EPROM-resident monitor, and thus 
perform simple diagnostics doWriload code: s ; 
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The cit program also provides the functionality of the display half of a console driver 
— it maintains and updates a memory map of the console display contents and passes . 
keyboard input through the command port to the microkernel running on the coprocessor. 
At the same time, it uses the DMA driver.to provide a simulated disk drive to the coprocessor 
microkernel, with DMA control I/O being multiplexed with the console I/O (and the lower 
level, automatic DMA control I/O), multiplexed over the command port I/O channel. 

Any host-side user-level process that wishes to use the coprocessor's DES engine must 
request those services using interprocess communication with cit. Jn turn, ci t will make ; 
the appropriate requests (via the cornmand port) to the coprocessor kernel to configure the 
DES engine appropriately. 

4.2.3. Coprocessor Kernel 

The coprocessor runs a Mach 3.0 microkernel that is downloaded by cit. The basic Mach 
3.0 kernel for the AT-bus 1386 required significant changes to its low-level interface code, 
in addition to new device drivers. This section outlines my changes. 

Low Level Machine Interface 

When Mach 3.0 is loaded into the Citadel coprocessor, the initial environment provided 
to, it by the bootstrap loader differs from that provided in standard i386 AT-bus systems. - 
In standard PC systems, the second level bootstrap loader switches the processor to 32-bit 
code and data segments before transferring control to the kernel's entry point, pstart, 24 
in addition to setting machine specific parameters such as conventional/extended memory 
sizes. The coprocessor PROM monitor downloading the kernel runs with 16-bit data and 
code segments, sal had to add new assembler-language, code to switch the processor from 
16-bit code/data segments to 32-bit segments at the kernel's startup. 

A more important difference in the low-level environment is that the interrupt subsystem 
is completely different — the Citadel coprocessor does not include a peripheral interrupt 
controller? and. interrupt priorities are hard- wired into the system inside a programmable 
logic device. Interrupts may t?e individually masked. The system provides seven interrupts: 

1. clock (1 kHz), . 'V" ,r,i •' '/ ' /' . 

2. command port input available, 

3. DES done, 

4. DES input FIFO full, .< . .r 
-5. DES output FIFO empty, -..-i- l , -,v i. j i; f ,-> : ,:r:n!->./r ; * 

6. DES input FIFO not full, ' ' /' ^ . ; " f ^ ' ^ ' " ^. V ' ^ " " ; r ; " ; " /' t 

24 The kernel expects to be using physical addresses at this point, th\js.&e;name. ,; ( . ; . n.^,] ... 
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7. DES output FIFO not empty. 



Note that there is no interrupt to indicate that the command port is writable (i.e., the 
previous data element has been read by the host). The coprocessor kernel must poll a status 
port to send data to the host; this testing is done at every interrupt (in interrupt . s), the 
maximum additional latency is one millisecond per transferred character. 

The DES input FJFO full and the DES output FIFO empty interrupts were intended 
to allow high throughput coprocessor encryption: a thread could write into the DES input 
FJFO at very high speeds, and switch to reading from the DES output FJFO when an input 
FIFO full interrupt occurs; similarly, when an output FIFO, empty, interrupt occurs, the 
thread may switch back to writing to the DES input FIFO. ' • . 

Console 

The microkernel console driver multiplexes its I/O with I/O from the DMA driver and a 
serial -line-style communications driver com through the command port. The command 
port I/O channel uses the lower 8 bits of the ; 16-bit wide port for the console and com 
driver I/O; high order bits are set in special command words to switch the command port 
channel between console and com driver modes. Low-level DMA negotiation data are sent 
with special bit patterns in the high-order bytes, allowing them to interrupt the multiplexed 
serial-line datastreams at any time without confusion. 1 - 

. The console subsystem does not provide a separate keyboard driver — the host-side c i t 
program sends ASCII values to the coprocessor. Special escape sequences are "provided to 
signal the console driver for access to the kernel debugger. ; / ; ; > ' 

Microkernel DES Engine Driver 

The DES engine control code within the coprocessor microkernel is not directly accessible 
as a driver. Instead, it is an internal device providing multiplexed services to the host 
emulated disk driver (hd) and the DES service^ driver (ds) T >, 

Each DES request is packaged in a structure specifying the DMA transfer mode (if any 
— a request may also be entirely local to the coprocessor), encryption key and initialization 
vector, transfer size, etc. Encoded with each DMA request is also a client-id sent with a 
DMA request descriptor to the host via the command port. The requests are read by ci't 
and acknowledged before initiating :the DMA transfer 

Host Emulated Disk '•• ' < , 

The microkernel contains a host ernulated disk .driver (hd) which uses the DMA multiplex- 
ing driver to transfer data blocks to/from the host. The entire disk image provided by c i t in 
the host is encrypted, and my code uses encryption/decryption DMA transfers to access it. 
The default pager using irus em cSproSessor-based appli- 

cations with truly jpri Vat6 virtual memory^ (Alternatively, crypto-patging could be performed 
to a single enctyptedf partif ion, ana the remainder of the disk could stay unencrypted.) 
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The use of crypto-paging to protect the privacy of virtual memory can be inefficient if 
the emulated disk block sizes are smaller than the size of a physical page on the host, since 
the DMA negotiation and setup overhead would be incurred for partial page transfers. For 
efficiency reasons, the emulated disk's blocksize must be a multiple of the virtual memory 
page size in the host. Currently,- both .the VM page size and the emulated disk block size 
are 4 Kbytes. ' ., : , • ■ . ■ ■ 

A simple extension to the encrypted emulated disk would provide multiple disk images 
on the host, permitting one or more of them to be used for data sharing with the host (but 
not for simultaneous access). In -a similar fashion/ an emulated network interface may 
be provided to the secure coprocessor,. allowing the use of NFS [79]. and other network L 
services. In the case of NFS, meta-data (directory information) would ,-npt be encrypted. 

DES Service Interface 

The DES engine interface ds provides another multiplexed service, the DES service driver 
interface. The ds interface provides: the coprocessor applications access to DES; opera- 
tions — including host-to-host filter mode operations performed on the behalf of host-side 
applications. . , : . , . A 

Coprocessor-side applications typically make DES service requests to.a crypto-server 
(crypt.srv) which is responsible for scheduling access to the DES engine for both the 
coprocessor-side applications and the host-side applications that make requests . through 
cit. The crypt.srv server runs inside: the coprocessor, and is the sole client of the 
ds driver. While the scheduling decisions and the simple protocols required to.implement , 
them could be performed entirely within the drivers, having the crypto-server implement 
scheduling policy outside of the kernels leads to gains in overall flexibility. 

The ds driver provides device-level access to th$ DES engine, with .each device 
remote procedure call request being serially serviced The various modes of operation of 
the DES engine are set via device- set _ status () remote procedure call requests; 
device.read ( ) and device_write ( ) remote procedure calls turn into DES opera- 
tions involving local coprocessor-resident data. Another special devi eel set .status ( ) 
remote procedure call initiates host-only filter operations. ' 

Secure Memory Interface. / 

Dyad's model of secure coprocessors depends on the availability of privacy-protectedpersis- 
tent memory. Such protected memory can hold encryption keys, since privacy is guaranteed 
in a very strong sense. Similarly, cryptographic checksums are stored in protected memory 
— integrity of data is well protected by the system, since only the coprocessor system 
software may modify (or reveail) protected memory contents^ | - 

Hardware Secure Memory The Citadel epproc^^rj sy^t^pT^yi^es64 kilobytes of 
battery-backed: memory (secure RAM / npri-yplatile,!^ detec- 
tion circuitry. The circuitry, erases memoryJf:My r &ttempt at physical access is detected, 
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ensuring the privacy of the memory contents. Additionally, 64 kilobytes of EEPROM is 
available for persistent (but not necessarily private) storage. Since any attempt at pene- 
tration results in erasure of critical keys required to load the coprocessor system software, 
altering EEPROM contents results in catastrophic failure of the Dyad system. 

EEPROM contents may be made private by encrypting the EEPROM contents with a 
key kept in secure RAM. 

Secure Memory Service The Dyad secure coprocessor kernel exports secure RAM 
and EEPROM raw access to user applications by permitting applications to map these 
memories into their address space via the mmap primitive on the iopl device. 

We employ a special secure memory server sec _ ram to provide coprocessor appli- 
cations with controlled access to secure RAM and EEPROM via a remote procedure call 
interface allowing clients to read/write their own regions of secure memory. (Alternatively, 
all coprocessor applications could directly map the secure memory into their own address 
space.) 

Encapsulating secure memory access using a special secure memory server means 
that errors in the user-level applications within the secure coprocessor are unlikely to 
corrupt secure memory contents of another application. Furthermore, my sec _ ram server 
provides the system with the ability to dynamically allocate secure memory among various 
coprocessor-side clients; memory compaction to reduce or eliminate fragmentation is also 
feasible. Additionally, the memory server can implement the common code required to 
make atomic block updates (16-bit word updates of the secure RAM are assumed to be 
atomic, since the i386SX uses a 16-bit data bus to write to the secure RAM). Similarly, the 
sec.ram server can mask the complexity of the hardware EEPROM update protocol for 
the user. 25 

The disadvantage of the sec.ram approach is speed, since secure memory accesses 
would run several hundred times slower than direct access, depending on the size of memory 
accesses over which the remote procedure call overhead is amortized. 

Cryptographic keys are kept in the secure memory by the sec.ram server for the 
various coprocessor applications. Note that applications must have unique IDs for allocating 
and accessing secure memory from the sec.ram server. These IDs are also persistent 
quantities, since all runs of the same application should access the same data private to 
the application. Because applications have no access to any persistent memory (other than 
their own instructions) before contacting the sec _ ram server, and external non-encrypted 
storage is vulnerable, there is a bootstrapping problem. We can solve this problem by 
binding the secure memory access ID with the application at compile time, since coprocessor 
application binaries are guaranteed their integrity by cryptographic checksum verification. 



25 Making EEPROM updates atomic is harder, since we do not have atomic writes. An entire sixty-four 
byte page of the Xicor EEPROM used by Citadel must updated in a single step, and each page mode update 
requires up to 10 mS for the write cycle to complete. Secure RAM can be used to provide a directory into the 
EEPROM and preserve the appearance of atomic updates of the 64-byte pages. 
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There is no need to protect the privacy of these ID values, since they only refer to secure 
memory regions. A drawback is that static allocation of the JDs implies that external 
ID granting authorities must exist. Because these lDs do not have to be contiguous, the 
granting authorities may be distributed (much as. physical Ethernet addresses are currently 
allocated). This aspect of application installation. ties, in with system bootstrapping and 
maintenance, discussed in chapter 6. 
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Chapter 5 

Cryptographic Algorithms/Protocols 

This chapter discusses and analyzes the key algorithms used in Dyad. 26 The notation used 
is standard from number theory and algebra (groups, rings, and fields). 

In addition to the zero-knowledge authentication and key exchange algorithms below, 
Dyad uses public key signatures and public key encryption [78] (e.g., for copy-protected 
software distribution). In lieu of the zero-knowledge authentication and key exchange 
algorithm presented here, RSA or Diffie-Hellman key exchange [25] could be used instead. 
RSA and Diffie-Hellman have weaker theoretical underpinnings; for example, RSA is 
known to leak information (the Jacobi symbol) [51], and our zero-knowledge authentication 
scheme provably does not. Similarly, in lieu of Karp-Rabin fingerprinting, other crypto- 
graphic checksum algorithms such as Rivest's MD5 [77], Merkle's Snefru [56], Jueneman's 
Message Authentication Code (MAG) [44], IBM's Manipulation Detection Code (MDC) 
[41], or chained DES [102] could be used. Primes needed in the key exchange algorithm, 
the authentication algorithm, and the two merged key exchange/authentication algorithms 
may be generated using known probabilistic algorithms such as Rabin's [70]. 

There are two main sections in this chapter. Section 5.1 describes all of the algorithms 
in detail. A programmer should be able to reimplement the protocols from this part alone. 
Section 5.2 revisits the algorithms and provides an analysis of their cryptographic properties. 

5.1 . Description of Algorithms 

Before the description of my algorithms, I define some terms that will be used throughout 
this section. 

A number M is said to be a Blum modulus when M — P • Q y and P, Q are primes of the 
form 4£+ 3. Moduli of this form are said to have the Blum property. Blum moduli have 
special number theoretic properties that I will use in my protocols. 

A value is said to be a nonce value if it is randomly selected from a set S and is used 
once in a run of a protocol. The nonce values that we will use are usually selected from a 
ring Z^, where M is a Blum modulus. 27 



26 This chapter is a slightly revised version of my paper [98]. These algorithms first appeared in the Strongbox 
system. 

27 Z* denotes integers modulo n relatively prime to n considered as a group with multiplication as the group 
operator. 
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5.1.1. Key Exchange 

End-to-end encryption of communication channels is mandatory when channel security 
is suspect. To do this efficiently. ] use private-key encryption coupled with a public-key 
encryption algorithm used for key exchange. J first describe the public-key algorithm. 

What properties do we need in, a public-key encryption algorithm? Certainly, we want 
assurances that inverting the ciphertext without knowing the key is difficult. To show that 
inverting the ciphertext is difficult, often we show that breaking a cryptosystem is equivalent 
to solving some other problem that we believe to be hard. For example, Rabin showed 
that his encryption algorithm is equivalent to factoring large composite numbers, which 
number theorists believe to be intractable [67]. Unfortunately, Rabin's system is brittle, 
i.e.. if the user's program (or other hardware/software agents-working on the user's behalf) 
can be made to decrypt ciphertext chosen by an attacker, it would be easy for the attacker to 
subvert the system, divulging the secret keys. The RSA encryption algorithm [78], while 
believed to be strong, has not been proven secure. Chor [17] showed that if an attacker 
can guess a single bit of the plaintext when given the ciphertext. with an accuracy of more 
than 1/2 + e, then the attacker can invert the, entire message. Depending on your point of 
view, this, could be interpreted to mean, either that RSA is strong in that not a single bit of 
the plaintext is leaked, or that RSA is weak in that all it takes is ; one chink in its armor to 
break it. The public-key cryptosystem used in Dyad is based on the problem of deciding 
quadratic residuosity, another well-known number theoretic problem that is believed to be 
intractable. 

When* a, connection js established between a? client and a server, the two exchange 
a secret, randomly generated DES key using a public key system. Because private key 
encryption is much cheaper, we use the DES key to encrypt all other traffic between the 
client and the server. ^ 

The public key system works as follows: All entities in the system publish via a white 
pages server their moduli, {M/}, where A/j- is a Blum moduli. The factorization of M i9 of 
course, is known only to the entity corresponding to M, and is kept secret. 

Observe that Blum moduli have the property that the multiplicative group ~L* M has —1 
as a quadratic non-residue. To see this, let L(a,p) denote the Legendre symbol, which is 
defined as , 5 . : • ; ^ ;/ > 



L{a,p)= iy l _ x 



if a is a quadratic residue, i.e., if 3 x : x 2 = a (mod p) 
otherwise , 



where p is prime and a € Z*. Now, we are going to use two important identities involving - : 
the Legendre symbol: 28 , .. >i : . .... . - 

' ■ ^ ■ ' > ; ' ^*L(^l,p) ~ '^-W . v ,v , . , • . (5.1) 

v - ' ; J (5.2) 



8 See [60] for a list of identities involving the Legendre syfribol. 
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When p = 4A' + 3, from (5.1) we have L{—\.p) = —I 2 ** 1 . - —1, so —1 is a quadratic 
non-residue in Z* . This suffices to show that — 1 is a quadratic non-residue in 1\ u , since if 
there is a root r such that r 2 — — 1 mod M iz then (r mod /?) must be a square root of — 1 in 
as well, where p is a prime factor of h4\. 

The properly that —1 is a quadratic non-residue makes it easy to randomly generate 
random quadratic residues and non^residues: simply chose a random 29 r 6 ~L\ 4 . and compute 
r 2 mod ~Lsi r Jf we want a quadratic residue, use t 2 mod M,; if we want a quadratic non- 
residue, use —j 2 mod A/,-.' - ! ji ' ; 

Therefore, given /? = p ■ q where both p and q are primes of the form 4k + 3. it is easy to 
generate random quadratic residues and quadratic non-residues. Next, note another property 
of quadratic residues that will> "enable us to decode messages: The important property of 
the Legendre symbol is that it can be efficiently computed using a algorithm similar to 
the Euclidean gcd algorithm. Note that this likewise holds- for the generalization of the 
Legendre symbol, the Jacobi symbol, defined by 7(/2, m) = TliLfa,p;) where m — YliPi, 
where the p^s are the prime factors of m. The value of the Jacobi symbol can be efficiently 
calculated without knowing the factorization of the numbers. • > ;1 

The following approach was. described in [30], Suppose a client wants to establish a 
connection to the server corresponding to M). The client first randomly choses a DES key k, 
which will be sent to the server using the public key system. The client then decomposes the 
message into a sequence of single bits, b 0y b\, . : . , b m . Now, for each bit of the message bj, 
the client computes xj =■ — l ^r? (mod M f ) where r y are random numbers (nonce values)^ 
The receiver i can compute bj = L(xj,Pj) to decode the bit stream since he knows the 
factorization of M x . Note that while the Jacobi symbol^ the generalization of the Legendre 
symbol, can be quickly computed without knowing the factorization of Mu it does riot aid 
the attacker. We see from - 

J(-r2,M) = J{-hMi)J{i*,Mi) 
" 1 = -l • -l A^Md 

V = J(^M,) ~\ 

\ ■ , y ;; ; ; \ : 

that quadratic non-residues formed as residues modulo Af,- of — r 2 will also have 1 as the 
value of the Jacobi symbol. 30 



29 We can actually just chose r € Z^,. and not bother to check that r G ZJ^.. If r £ Z^., this means that 
GCD(Mj, r) f 1 and we've just found a factor of Mi. Since factoring is assumed to be difficult, this is an 
highly improbable event. ' t : ■ . r* -* . , 

30 Some cryptographic protocols, such as RSA, leak information through the Jacobi symbol. In RSA, plaintext 
and corresponding ciphertext always have the same value for their Jacobi symbols. To see this, consider the 
Legendre symbol: if L(x,p) = 1, then there exists a residue r such that r 2 = x mod p. But X s = (r*) 2 mod p, 
sq if is a quadratic residue of x e '. If L(x x p) t rrh^then.LQ^ y p) =.^1. as well, since e is odd. Because 
J(x,pq) = L(x,p)L(x, q), J{x,pq) = J(x* \pq) holds. This information leak can be significant in some applica- 
tions where only a limited number of messages or message formats are used, since attackers can easily gather 
statistical information on the distribution of messages. . : . ^ . r - v , 
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When the receiver has decoded the bit sequence bj and reconstructed the message m { , 
he installs m, as the key for DES encryption of the communication channel. From this point 
on. DES is used to encrypt all coprocessor managed remote procedure call traffic between 
the client and the server. 

5:1.2. Authentication 

Whether or not communication channels arp secure against eavesdropping- or tampering, 
some form of authentication is needed to verify the identity of the communicating parties. 
Even if the physical network links are secure, we still need to use authentication: to look 
up the communication ports of remote servers, we must ask a network name server on a 
remote, untrusted machine. Since we make no assumptions about the trustworthiness of 
the network name servers, even the identity of a remote host is suspect. In addition to 
the existing network name service, the secure coprocessor uses a White Pages server that 
maintains authentication information (in addition to key exchange moduli when applicable) 
and is itself an authenticated agent — the White Pages services have digitally signed 
authentication information associated with them, and so no directory lookup is required for 
them. The digital signature is generated by a central, trusted authority. For the purposes 
of this discussion, the role of the White Pages server is to serve as a repository of trusted 
authentication puzzles. Authentication is based on having the authenticator prove that it 
can solve the published puzzle without revealing the solution. 

The best available protocols for authentication all rely on a crucial observation made 1 
by Rabin [67]: if one can extract square roots modulo n where n = p q, p and q primes, 
then one can factor n. This theorem has led the way to practical zero-knowledge authenti- 
cation protocols. Two important examples of practical zero-knowledge protocols include 
an unpublished protocol first developed in 1 987 by Rabin [74] , and a protocol developed by 
Feige, Fiat, and Shamir (the FFS protocol) [27]. Between the FFS and Rabin's protocols, 
Rabin's method is much stronger bdcause it provides a super-exponential security factor. 
In contrast to Needham and Schroeder's authentication protocol [59], both of these zero- 
knowledge authentication protocols require no central authentication server and thus there 
is no single point of failure that would cripple the entire system. The Dyad system uses a 
modified version of Rabin's authentication protocol. Like Rabin's protocol, my protocol is 
decentralized and has a super-exponential security factor. i 

What do we mean when we say the authentication is zero-knowledge*? By this we mean 
that the entire authentication session 1 may be open — an eavesdropper may listen to the 
entire authentication exchange, but will gain no information at all that would enable; him to 
later masquerade as the authenticator. 

Let's see how. authentication .wprks^j After establishing a secure communication chan- 
nel with the remote entity, an agent queries the. white, pages server for its corresponding 
party's authentication puzzle. Authentication puzzles. are randomly generated when a new- 
authenticated entity is created and cat* be ! ; solved only, by their owners, whb know their 
secret solutions. However, the rferhote* entity does hot eixhibit a solution to its puzzle, but 
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rather is asked to show a solution to a randomized version of its puzzle. My puzzles are 
again based on quadratic residuosity — this time not on deciding residuosity but on actually 
finding square roots. 

Whenever a new entity is created, an authentication puzzle/solution pair is created for it 
in an initial, once-only preparatory step — the puzzle is published in the local White Pages 
server, and the solution is given to the new task. The secure coprocessor creates a new 
puzzle for every new user of that coprocessor, and the White Pages directory is provided 
by the secure coprocessor which guarantees its integrity from tampering. 

The authentication puzzle consists of a modulus A/, =/?,■ • q, and the vector 

•■ ■ • ' Vi = (\>,m', v /|2 , <••. v/;,,-^, v/,„) ■ " " •' 

where/?, and ^ are primes, : and each;v, :/ is a quadratic residue in ~L* Mr The authentication 
modulus is distinct from the key exchange modulus; in the authentication algorithm, it is 
not necessary for anyone to know the factors /?, and q u and in fact a single modulus can be 
used for all authentication puzzles. The secret solution is the vector 

where Sij are roots of the equations x 2 .= l/^i'j . (mod M,). Generating a new solu- 
tion/puzzle pair is simple: we choose random s ix j ?. € ~L M . to form the solution vector, and 
then element-wise square and invert Si modulo My to form the puzzle V, ' 

Suppose a challenger C wants to authenticate *4's identity. C first randomly choses a 
boolean vector E 6 {0, 1}": , . , > 

.-.v.,- ■ - ■ £-(ei,<?2,... >^„_i, e„) 

where E o E = |jj , and <j>, € S n a permutation. 31 We can represent <f> as a number cp from 0 
to n\ — 1 which represents elements of S„ under a canonical numbering. 32 

The pair (E 9 <f>) is the challenge that C will use to query A. Now, C encodes E and <p as 
follows: f - , , 

... -r C = felv^2) - - r '5 Cirhpogfo !)"[)> \ 

where i 




- l e 'r? mod M 1 ^ if 1 < i<n 

— l^tf mod M pu t otherwise 



where <p ri denotes thei'* bit of tp and are nonce values from ZJ, a and M pu ^ is the Blum 
modulus that is used by all entities in this initial round, \&. f .,M pu b. ~ -PpubQpubi* where 



31 E 6 E denotes the dot product of E with itself. 5 rt tieriotes : the symmettic group of ri elements; J 
32 Note that this numbering provides a way: to i^arfiflomly chbose 5 ^: since ^'requires' log(n!)' bits to represent, 
we can simply, generate [log(n!)i]; random ibits^aiid:use it as -a number from X> fd-2 r.*ofit»l>t J f if the number 
is greater, than n\ — 1, we try- again.; ;Tlns procedure. terminates, in an^exfcec.tedv two* tries,; so on average we 
expend 2 . riog(n ! )] random . bits. Other approaches are given in [24, 48]- . ^ , ^ . y ■ , i : v r f . ■. . ■ ; ; J « 



57 



P P ub = 0 P uh = 3 (mod 4). The values of P pu h and Q pu b are secret and may be forgotten 
after M pt ,t was generated. 

C sends the encoded challenge C to A. 

When A receives C, A computes the nonce vector 

where rj are randomly chosen from ZJ, , and the vector 

X = (x\ , X2 , . . • , 1 , x„) 

where xj = (mod Mi). The authenticator sends X, called the puzzle randomizer, to the 
challenger C, keeping the value of 7? secret. As we will see in section 5.2.2, X is used to 
randomize the puzzle in order to keep the solution from being revealed. 

C responds to the puzzle randomizer with T = (t\ , t 2 \ . - - , t n ~\ , of nonce values used 
to compute C. Using T, A reconstructs (£, ^). ; f 

:; Jn response to the decoded challenge, .4 replies with . 

_* .,)'-'-' 
y= (yi,;K2, ... 5 j>ii-],j>#!) 

where y 7 = • s^- (mod A/,-). 7 is the response. To verify, the challenger checks that 
Vy : xw> =)j' v Z ( mod 

5.1 .3. Merged Authentication and Secret Agreement 

Instead of running key exchange and authentication as separate steps, I have a merged 
protocol that performs secret agreement and authentication at the same time. The protocol 
performs secret agreement rather than key exchange: after the protocol completes, both 
parties will share a secret, but neither party in the protocol can control the final value of this 
secret. This merged protocol has the advantage of eliminating a remote procedure call, but 
requires that the authentication security parameter n (the puzzle size) be at least 2m, where 
m is the number of bits in a session key. We do not. use this protocol in our current version 
of the system since we need a much weaker level of security than the n = 2m level! Our 
merged protocol goes as follows: 

As in the normal key exchange protocol, each entity i in the system calculates a Blum 
modulus Mi — PiQu with P, and Q t primes of the form 4£+ 3. Entity i keeps the values of P, 
and Qi secret and publishes M, . Entity i also generates a random puzzle by first generating 
the desired solution vector 

S{ — , Si' f 2\> J ' ' ' "5 

where the elements of Si* are computed by Sij = z? where z^ is a random,number from T* Mr 
Then, i publishes the p\x^\6 ^ [ : ' ; ; L 

i - ' - ' - !i ^ = (v ; ;,;^ 2 ,^^v / , n ) ; • • • - ■ 
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with v, 7 = 1/^. With both M { and V { are published, / is ready to authenticate and exchange 
keys. 

When the challenger C wishes to verify A's identity and obtain a session key from A ? 
C first chooses a challenge as before, with £ <E .{0. I}" such that £ o £ = |jj 5 and 

permutation <f> G S n , Just as in the previous authentication protocol, C encodes £ and <j> 

where 



mod Mpub if 1 < * < « 
2 modMpui, otherwise 



where 9? is the canonical numbering of <^ € S n , ^.denotes the f h bit of i# 9 tj is a nonce value 
from ~£-M pub , and M pub is a Blum modulus. C sends A the encoded challenge C. Let r denote 
the vector of nonce values used to generate CJ 

A computes a puzzle randomizer^ by first computing a pfe-randomizer R, which will 
be used to transmit the key bits. A computes R 

, R = {r u r 2 ^. : ^r nr7 . u r n ) • • 

by randomly choosing the nonce vector 

The values wj are chosen from 1* MqMc , where M a is the published modulus of A and M c is 
the published modulus of C. The value of R is obtained by setting r y ~ —\ bj ~w* mod ~L Ma M c > 
where 6, is a random bit: Some of the these bits 6, will form the secret transferred. Next/ .A 
computes the puzzle randomizer X from R as -beforey setting xj = rj mod T^m u m c •> an d sends 

x\o c. • ' ' / '„ ' ' : / 1 ' ' ' ' ' .. / 

Now, C reveals the challenge (E><f>) by sending A the vector T; in response, A sends ? 
with _ . 

yj = r m * 4j mod (M a Ml~ e 0 ■ • 

To verify A's identity, C checks that , : , - « 

V/: = j£v£, mcd Afe 

There are [|| usable key bits transferred, and they correspb^ bj for which ej = 0. 

To extract 6 7 , C computes the Legendre symbol L(y Jy P c ) to determine whether yj is a 
quadratic residue. If yj is a quadratic residue^ then bj ~ 0; otherwise, bj= 1. 



59 : 



5.1.4. Practical Authentication and Secret Agreement 



In this section. I present another protocol for simultaneous authentication and secret agree- 
ment requiring two rounds of interaction but fewer random bits. Furthermore, the message 
sizes are smaller, thus making this protocol more practical. This protocol strikes the best 
balance between performance and security, and I have implemented it for Dyad. 

Each agent A who wishes to participate in the protocol generates a modulus M a with 
secret prime factors P a and Q a . Each agent also generates a vector of secret numbers 

where s a .i £ Z^ o - From this S a , A computes < . . \ 

•V Q = (v flf i,v flt 2,' ; --,v ? - (n ) '* • " ' ' 

where v a> , = \/s* a i modM a . Published for all to use is a modulus M pu t', the two prime 
factors of M pub , P pub and Q pub , are forgotten as in the previous protocol. 

Now, suppose a challengerC wishes to verify, the identity of an authenticator A. Assume 
the parties have published their moduli M c and M a , respectively, and that C's puzzle vector 
Khas also been published. First, C chooses a bit vector 

E=(e u e 2 , i - . • . . • . 

where E o E = and a permutation <f> G 5„. The pair (E, <f>) is the challenge that C will.. 

use later in authentication; Let C — (["])> tne number of possible vectors Encode both 

E and <j> as numbers using mappings/: { E} <-> and g: S n 'Z n \ . Let E — g(<f>) • C + /*( 
the combined encoding for the two parts of the challenge, 33 and let C = E 2 mod M pub . The 
value C is used to commit the value of C's challenge to A 7 preventing C from changing it 
after learning the puzzle randomizer. C sends C to A. .< ■ 

In response, A generates a puzzle randomizer by choosing, 

• ' R = (r x ,r 2j - : ^n)^ 1 ' / 

where each r, is a nonce value chosen from 7-M a M c - ^ creates the puzzle randomizer vector 
X from this by setting 

X = (X],X2, * * * ,x„) 

where x/ = r\ mod M a M c . A sends X to C. C will have to recover some of the values of R 
in order for the protocol to work. These values will become the agreed upon secret used 
as private keys. C will recover exactly those r, where = 0. There are exactly J |] such 
values. Let those i such that e t = Obeiheset/., , ( 

33 If \E\ f \Mpub\, extra random pad bits may be necessary. 
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When C receives the puzzle randomizer, C replies by revealing the challenge by sending 
£toA 

A verifies that this £ encodes the challenge that corresponds to the challenge commit- 
ment value C by checking that C = E 2 mod M pu b- If the encoding is correct, C extracts the 
challenge tuple (£\ 6), and computes 

where v, = i% {() s] ei mod M^M\~ ei . 

Now .4 composes a special vector W. The ith entry of this vector will be the pair 

(w h h Ui (Wi)) 

where i G /, = r^, w x - is a nonce value, and is an element of a family F of cryptographic 
hash functions. A sends Fand WXo C. 

C verifies that >. ». / 

.Vi:^vf=^ (0 mod^- e ' . • ■ 

If each y x passes this test,'C then examines the values of y- x for which e, = 0: since 

yi = mod M c - 

and C knows the factorization of M C9 C can extract the four square roots ofy, mod M c , one 
of which was the original chosen by C. 34 To choose the proper root, of y h C uses the 
ith element of W, C can try all four square roots of y) mod M c and see which bne gives 
the value that matches the^value - sent by A. This assumes that F is immune from known 
plaintext attacks. (One class of functions that could be used as F is a family of encryption 
functions.) ; . 

5.1.5. Fingerprints 

Next, I describe the Karp-Rabin fingerprinting algorithm, which is crucial to Dyad's abil- 
ity to detect attackers or security problems in the underlying system. The key idea is 
this: associated with each file — in particular, every trusted program generated by trusted 
editors/compilers/assemblers/linkers/etc. — is a fingerprint which, like a normal check- 
sum, detects modifications to the data. Unlike normal checksums, however, fingerprints 
are parameterized by an irreducible polynomial 35 and the likelihood of an attacker forging a 
fingerprint without knowing the irreducible polynomial is exponentially small in the degree 
of the polynomial. 1 '" ' ' : ' : ' " ' - : ^ 

34 Standard algorithms for modular square root computation are given in [3* 7]/ " ' 

35 A polynomial/?^) € F[x] (F a field) is said to be irreducible n%fQc) € F[x]: /(i) f /?(*)', 6 < deg/< deg/?, 
i.e., the only divisors are p and nonzero elements of F (the units of F[x]). This is analogous to primality for 

inte S ers * , ■ > . - ... 
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Dyad chooses random irreducible polynomials p from Z 2 [.y] of degree 31 by the algo- 
rithm due to Rabin [45 s 68, 71]. 

Here is one way to visualize the fingerprinting operation: We take the irreducible 
polynomial p(x), arrange the coefficients from left to right in decreasing order, i.e., with 
the x 31 term of p(x) at the leftmost position, and scan through the input bit stream from 
left to right. If the bit in the input opposite the A: 3 1 term is set, we exclusive-or p(x) into 
the bit stream. As we scan down the bit stream all coefficients to the left of the current 
position of the x 3] term of p(x) will be zeros. When we reach the end of the bit stream, 
i.e., the x° term of p{x) is opposite the last bit of the input stream, we will have computed 
f(x)modp{x) = <p(f(x)). 

5.2. Analysis of Algorithms 

5.2.1. Key Exchange 

The correspondence between the problem of deciding quadratic residuosity and the protocol 
is direct. For a detailed analysis, see [30], 

5.2.2. Authentication 

What are the chances that a system breaker B could break the first (unmerged) authentication 
scheme? As we stated before, we assume that the modulus M ( is sufficiently large so that 
factoring it is impractical. Now, consider what B must do to pose as A. 

Let us first look at a simpler authentication system to gain intuition. Let the puzzle and 
the secret solution be v and s where v = 1 js 2 \ let the puzzle randomizer be x = r 2 (r known 
only to the authenticator); let the challenge be e G {0, 1 }; and let the response be y = r • s e . 
All calculations are done modulo M. 

We claim that if B could slip ; through our authentication, procedure with more than 
probability ~, then,# could- extract the square roots and thus, factor M, violating our basic 
assumption. To wit, in order for B to reliably pass the authentication procedure; it must be - 
able to handle the case where e is either 1 or 0, and thus it would need to know both r and 
r ■• s. This means that he would be able to compute the square root of y, which we know 
from Rabin [67] is equivalent to factoring. 

What must B dp in the full version pt the authentication? Jn order to pass the challenge, 
B must kno\v the value of In addition, Bjiriust know part of <j>.. In particular, B does not 
have to guess all of <j> but cmly those values selected by the 1 entries in E. 

Thus, while 

■ \{(E, <f>): E G YO,^}",^ = [f J ^ € £}{ - (j^nii " 
our the security Factor (the- inverse of the probability of breaking the 'system) is slightly 
smaller. Our autherificatibri b^ferii' provides, for puzzles' of ri numbers, a probability of an 
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attacker breaking the authentication system of 

P = ! 

^ (h/2)! 3 

- ' -r= V^ " n e 2 ' ' 

2 2 n 2 

(using the Stirling's approximation of n ! ^ \Z2nn ) which shows that P is clearly super- 
exponentially small. By using longer vectors or multiple vectors (iterating) the security 
factor can be made arbitrarily high. Note that since the security factor is super-exponential 
in n, the puzzle size, and only multiplicative when the protocol is iterated, increasing puzzle 
size is usually preferable: If n', the new size of the puzzle, is 2/2, then the probability of 
successfully breaking the system becomes 

2^+1 <2«)"-- 2 

_ \Ar2n g" - • • ? \ t 

2 3 " +I (2n) n ' ' 

_ . \/27rn e" 
2 2 +, 2 2 2"n n 

' ' ■ = 2^(7^)^ 

'■' r '■ ' ' 2 ¥ +I 2 ¥ + , 2«V^n" 

• ' ' ' " = P 1 : ' 

2 n '2 y /^i 

On the other hand, if we simply run the protocol twice, we would only obtain P' = P 2 . 
Iterating does have one advantage: it makes the selection of the security factor (1 fP) flex- 
ible. Using iteration makes it easy for applications at- different security levels to negotiate 
the '•■ desired security of the connection. : 

How did we- arrive at the expression for P? 1/P . simply measures the number of 
equiprobable random states visible to the attacker. First, note that ( n " 2 ) is the number of 

different E where EoE= [|J (i.e., the number of 1 b\ii in E is [| J ). The n) /(« - i)\ term 
gives the number of ways of chosirig i objects from h without replacement, 'which is what 
the projection, as specified by 'the on (i.e., i) values in of the permutation <f> gives us. 
Why do we restrict E to have [|j on bits? lfj = E 6 E coul'd be any value, then there 

would be J2l=o (it) {n-ky. different states visible to B not all of which wpuldbe equiprobable 
if E and <£ t are chosen, uniformly. frprh:{0, 1^ ean. be seen that 

the state corresponding tp y = Q is most, probable, ^is weakens ,the;- of our 



63. 



algorithm. Jn the limit case where E is the zero vector, our algorithm no longer provides 
super-exponential security. 

Note that my protocol provides super-exponential security only if the moduli remain 
unfactored. Since there is an exponential time algorithm for factoring, it is always possible 
to break the system in the minimum of the time for factoring and the super-exponential 
bound. Thus we can scale our protocol in a variety of ways. 

The authentication protocol not; only provides super-exponential security when the 
moduli cannot be factored, but is also zero knowledge. The encoded challenge vector, C, 
performs bit commitment [ 13, 14, 21, 72, 73], forcing C to choose the challenge values prior 
to A choosing the puzzle randomizer. This means that Band <^> can not be a function of X y 
and thus the challenger's side of the protocol can be simulated by an entity that does not 
have knowledge of any of the secrets. Any entity S can simulate both sides of the protocol 
— S can choose random E, and,, knowing their values, construct vectors X and r that 
will- pass the verification step: 

„ , ■ yj = r m* x J7.*j - ife 7 = o 

; yj = r4<j), Xj = rj ; v^-i w if ej. = 1 

Note that my model di ffers slightly from the usual model for zero knowledge interactive 
proofs because both the prover and the verifier are assumed to bfe polynomial time (and 
that factoring and quadratic residuosity are not in polynomial time); if the prover were 
infinitely powerful, as in the usual model, the prover could simply factor the moduli used 
in the bit commitment phase of our protocol. Other bit commitmerit protocols may be used 
instead; e r g :? we could use a protocol based on the discrete log,problem.[83] requiring more 
multiplications but use fewdr random bits. 

5.2.3. Merged Authentication and Secret Agreement 

Like the first authentication algorithm, the merged authentication and key exchange algo- 
rithm reveals, no information if factoring and deciding quadratic residuosity are intractable. 

How does the merged algorithm differ from tHe original algorithm? I use M a M c as the 
modulus for the nonce vectors, and Tuse quartic residues instead of quadratic residues for 
the puzzle randomization vector X 

No information is leaked. An analysis similar to that done above establishes this fact. 
When e y ^ 1 , we know that 

■ " yj ' - y r^-s Q j mod M a x ' '■ "" ' 

= ^- 1^ '•■ (w^ysAj) 2 mod M a 

JJ. bo;:: , , 

so yj looks like the square of a random number, possibly negated, in ~L* Ma - The challenger C 
or an. eavesdropper could; fiav& generated this Without A's ! help. (Note that the reason that 
this value is f c6rhput^d modulo Ma'ds because is the residue modulo ' Mq of a random 
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square; if we computed yj modulo M a M c . we would have no guarantees. as. to whether s a .j 
would be a quadratic residue.) 
When €j = 0. we have 

= - \ bj '- w 2 m mod M a M c 

This is just the square of a random value, p'ossibly negated, in Zm 0 m c - The challenger C or 
any eavesdropper could have generated this' without ,4's help as well; 

This proves that One atomic round of the! authentication leaks no information. As 
with the vanilla authentication, the vectors C and T provide bit commitment, forcing the 
challenge (E, <$>) to be independent of X, thus running the atomic rounds in parallel rather 
than in serial has no impact on the proof of zero knowledge. 

1 Might some system breaker Z? compromise the authentication? To do so, B must guess 
the values of E and <f> just as in the vanilla authentication protocol. As before, the probability 
of somebody breaking the authentication is super-exponentially small. (See section 5.2.2) 

The bits of the session key (bj) are transferred only when ey = 0. When e } ■= 1, C cannot 
determine the quadratic residuosity of the element jy y since vie assume that determining 
quadratic residuosity is intractable without the factorization of M Q ., When ej — 0, on the 
other hand, C can easily determine the quadratic residuosity ofy 7 - by simply evaluating the 
Legendre symbol L(y 7 -, P c ). . 

5.2.4. Practical Authentication and Secret Agreement 

Assuming that factoring is intractable, the third protocol (my "practical" protoc61) is also 
zero knowledge. In particular, breaking this protocol is equivalent to factoring: any system 
breaker B who has a strategy that allows B to masquerade as A can trivially adapt the 
strategy to factor the various moduli in the system. 

Let us examine ho>v this authentication/secret agreement protocol differs from the previ- 
ous one. Instead of using the quadratic residuosity decision problem to do bit commitment, 
this protocol uses the Rabin function, removing the requirement that the moduli have the 
Blum property. Since neither 14 nor C can factor, neither of them can extract the square root 
of an arbitrary number mod Mpub- In particular, .4 has no way of getting the encoding E 
from the commitment value C; the only way A finds out ihe value of C (and thus the value 
of (<£, E)) is for C to reveal C. The challenge commitment works as before. 

The analysis for the authentication properties are identical to that for the previous 
protocols, so I omit that here. (See section 5,2-2.) What about the zero-knowledge property? 

When ej — 1, we know that , , 

== 1 i r m ' S ajf mod M a 

so y) looks like the square of a r^dom number in Z^.Tlte eh^ 

could have generated this without A 's help. Note that the reason; that this value is computed 
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modulo M a is because s a j is the residue modulo M a of a random square; if we computed 
V/ modulo M a M c . we would have no guarantees as to whether s £itf would be a quadratic 
residue. 

When ej — 0, we have 

yj =' 4(/> mod M c ' 

This is just the square of a. random, value in T-* Mc . .The challenger C or any eavesdropper 
could, have generate this without A 's help as well.. , ' ( 

In both cases, a simulator S who pretends to be A and is able to control the coin flips 
of C can easily produce a run of the protocol where the message traffic is indistinguishable 
from that of an actual run. Since 5 can simulate the protocol without the secret known only 
to A, the protocol is zero knowledge. 

This protocol is much more efficient than the previous one, since it sends a factor of 
\M C \ more secret bits than the previous algorithm; this efficiency is somewhat offset by the 
fact that root extraction must be performed by the receiver, and extracting square roots is 
more expensive than computing the Legendre symbol. 

5.2.5. Fingerprints 

Before we analyze the performance of the fingerprint algorithm, we will fix some notation. 
We let p (or p(x)) refer to an irreducible polynomial of degree m (where m is prime). We 
use the symbol -» to denote surjective mappings, and F to denote the algebraic closure of 
the field F. 

How good is the fingerprint algorithm? Choosing random irreducible polynomials 
is equivalent to chosing random homomorphisms (p:Z 2 [x] GF(2 m ), where the ker- 
nel of <p is the ring generated by the irreducible polynomial p. To be precise, <p asso- 
ciates the indeterminate x with w, a root of the irreducible polynomial in the field Z 2 , i.e., 
<p:Z 2 [x] -h» Z 2 (m) = GF(2 m ). There are exactly (2 m - 2)//w.such homomorphisms. To 
compute the fingerprint of a file, consider the contents of the file as a large polynomial in 
Z 2 [x]: take the data as a string of bits h n , , , . . , 6i , ^o, and construct the polynomial 
f(x)~ .E-=o^. The fingerprint is exactly (p(f(x)); : t > r 

Now,/ can have at most . J* divisors of degree m. Any two distinct polynomials f\ 
and / 2 will have the same residue if/i — f 2 = 0 mod p. The number of polynomial divisors 
of/j — f 2 is at most nfm, so the probability that a random irreducible polynomial gives 
the same residue for / ] and f 2 is ^mL 2 jj m =n/ (2 m - 2); For a page of memory containing 

4 kilobytes of data z -(/t - 2 15 , or 32 kilobits), arid setting m to be 31, this probability is less 
than 0.002%: : ' : : ' : 1 ? f; ; - \ : 

This 0.002% probability measufes the kilobyte 
page of a file would 1 have a residue that matched that of the original — because the adversary 
has no knowledge Of the particular- homombfpmsm ^ better strategy trian 

guessing a pdl^ofhial (i.e;, r the^ata ih the r^iac^eiit ; p^ge). T The probability that the 
adversary could gues^the^hbmSmbfpffi — 2) c*r le^s than 0.0000015%, which 
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is much less likely. Hence we can see that the fingerprint algorithm is an excellent choice 
as a cryptographic checksum. 

The naive implementation of this algorithm is quite fast, but it is possible to achieve 
even faster algorithms by precomputation. Given a fixed p : and a set of small polynomials, 
we construct a table T of residues of those polynomials. I initially describe the algorithm 
for arbitrary sized p\ afterwards. 1 describe optimizations specific to m = deg/? =31. 

Let Tbe the table of residues of all polynomials of the formg(x)-,Y m , where g varies over 
polynomials of degree less than k. In other words, T gives us the function <f(g(x)- jc degp ) 
where degg(A*) < k. Using T allows us to examine k bits at a time from the input stream 
instead of one at a time. View /(.v) now as 

"' •' m. • - ' 

f(x)- X>(-v).v'* ■ . 

where dega,(;c) < k. The algorithm to compute the residue; r(x)'=f(x) mod p(x) becomes 
the code shown in Figure 5.1. 

r(x) = 0; 

for(i= [f];/>0;-/){ 
/(x) = r(x) ■ x k + afa); 
r(x) = r*(x) mod p(x); 

' } " : ; . : 

Figure 5.1 Fingerprint residue calculation. The operation r'(x) mod p(x) 
is performed by decomposing r' into g(x) * x m + h(x), where degg <; A: and 
deg h < m, finding r"(x) = g(x)-x m mod p(x) from T, and setting r(x) = r"(x)+h(x). 

If we fix the value m = degp = 31, we can realize further size-specific optimizations. 
We can represent p exactly in a 32-bit word. Furthermore, since word at a time operations 
work on 32 bits at a time* by packing the coefficients as bits in a word we can perform 
some basic operations on the polynomials as bit shifts and exclusive-ors: multiplication oy 
is a left-shift by k bits; addition or subtraction of two polynomials is just exchisive-or. 
Of course, since we are dealing now with fixed-size machine registers, we must take care 
not to overflow. 

, In Dyad, I have two versions of the fingerprinting code, one for k ■= 8 and the other for 
k = 16, both of which use irreducible polynornials>of degree 31 . To read ; the input stream 
a full 32-bit word at a time, I modified the algorithm slightly: instead of T being a table of 
<p(g(x) ' xf tgp ),,T contains <p(g(x) \; t x?} ); the code .aboyeis modified correspondingly. While 
the .residues <p(g(x) • xP) require, only 31, .bits to jepre^ent, .T x \ is represented as a table of 
machine words with 2 k entries, The progr^m can uniquel 

g(x) at the point* = 2 (this index is just the coefficient bitspfjgb wlneh.are already stored in 
a machine word as- an integer)." If we run the, code ^ Ippp^o perform this, o we will , 
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get a 32-bit result, which represents a polynomial of degree at most 31 . Hence the result of 
the loop. r(.v). is either the residue R(x) =/(-t) mod p(x) or 7?(a) + p(x), and the following 
simple computation fixes up the result: 

^f/7rY>= i r(u) ifdegr(x) < 31 

WW; | (r~p)(u) otherwise 

A particularly elegant implementation is achieved when we set k to be 8 or 1 6. The code in 
Figure 5.2 illustrates the algorithm for A' = 16. 

f p_mem (a , nwords, p, table) 
unsigned long *a, p, * table; 
int nwords ; 

{ 

unsigned long r, rlo, rhi, a_i 
int i ; 

r = 0; 

for = (i = 0; i < nwords; i+{}+) { 
a_i = a [ i] ; 
rhi = r >> 16; 

rlo = (r << 16} A (a_i >> 16) ,- 
r = rlo * table [rhi] ; 
rhi = r >> 16; 

rlo = (r << 16) A <a_i & ((1 << 16) -D); 
r = rlo A table [rhi]; 

} 

if (r >= 1 << 31) r *= p; 
return r; 



Figure 5.2 Fingerprint calculation (C code). 

This C code shows how using a precomputed table of partial residues can speed up fin- 
gerprint calculations. Unlike the actual code within Dyad, it omits loop unrolling, forces 
memory to be aligned, and may perform unnecessary memory references. 

For the case where k = 16, initializing T will be time consuming if we use the simple 
brute force method. Instead of calculating each of the 2 16 entries directly, we first compute 
the table T for k - 8, size 256, and then Tis bootstrapped from T in the obvious manner: 
for each entry in T, we simply use its index g(x), decompose it into g(x) = ghi(x) • x* +gi 0 {x) 
where deg^, < 8 and deg# /o < 8, and compute T[V hi (g hi ) © g lo ] © T lo {g h d * * 8 as the table 
entry. 

If a higher security level is required, multiple fingerprints can be taken on the same 
data, or polynomials of higher degree may be used. The speedup techniques extend well to 
handle deg/?(x) = 61, the next prime 36 close to a multiple of word size, though the number 
of working registers required (if implemented on a 32-bit machine) doubles. Our current 



36 While the algorithm for finding irreducible polynomials does not require that the degree be prime, using 
polynomials of prime degree makes counting irreducibles simpler. 
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implementation is largely limited by the main memory bandwidth pn the Citadel CPU's bus 
for reading the input data and the table size. Note that the table for k = 8 can easily fit in 
most modern CPU memory caches. 3f we use main memory to store intermediate results, 
performance dramatically degrades. 
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Chapter 6 

Bootstrap and Maintenance 



On the face of it, securely initializing and bootstrapping a secure coprocessor's system 
software can be very simple: burn all the code into the embedded ROM so the coprocessor 
will always run secure code. Unfortunately, this strategy is unrealistic. 

Practical requirements complicate the secure initialization and bootstrap of secure soft- 
ware running in a secure coprocessor: 

• maintenance and revocation updates of the trusted software by the secure coprocessor 
system software vendor (or a trusted authority); 

• installation of optional software by local system administrators; 

• efficiency of secure bootstrap; and 

• security. • \\\/<-\' ! . ' "'Y ; y 7 . 

Two aspects of bootstrapping go hand in hand: secure bootstrapping, and bootstrapping 
security. The former deals with verifying code integrity so untrusted code will not be 
executed with any privileges 37 , and the latter deals with increasing security guarantees 
provided by the system related to bootstrapping, using basic security properties of lower 
system levels as a basis [97, 98]. 

The process of secure bootstrapping must provide means of proving the trustworthiness 
and correct initialization of the final system to the end user. Additionally, depending on the 
users' degree of trust in the secure coprocessor hardware vendors / system software vendors, 
we may need to prove to the user (or site security administrator) that the coprocessor 
hardware (having passed through the system software vendor for initialization) is legitimate. 
This chapter addresses bootstrapping; the next chapter addresses the verification of system 
software and hardware. 

Digital signatures and cryptographic checksums are basic tools we use to attack secure 
initialization, bootstrapping, and maintenance. These tools are applied by each layer of 
bootstrapping code to verify the integrity and authenticity of the next higher layer, ensuring 
that only trusted code is booted. 



37 The secure coprocessor, when booted, runs a secure form of theMach microkernel. If administered correctly, 
untrusted user-level code may be loaded and run after booting. 
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6.1. Simple Secure and Bootstrap 



As a thought experiment, consider the simplest instantiation of secure bootstrapping: the 
bootstrap ROM for the secure coprocessor contains digital signature checking code. At 
boot time, this digital signature code verifies that the host-supplied kernel image is from 
a trusted authority. The trusted authority's public key may be kept in ROM 1 rather than 
secure RAM, since only integrity and hot secrecy is required; 38 The security kernel uses an 
encrypted file system image supplied by the host to load system servers and applications 
(the decryption key is kept in secure RAM). This preserves privacy and integrity guarantees 
for the rest of the operating system and the applications, thus securely bootstrapping to a' 
fully rurtning system. 

There are several things wrong with the above scenario, it is inflexible:' it Allows only 
centralized updates of system software and data; it requires (computationally expensive) 
digital signature verification for the kernel; it does not permit revocation of old rnicrokernel 
images (which may have security bugs); and it' does not permit resetting of the coprocessor. 
Fortunately, by providing a layered security bootstrap, all these flaws can be fixed. 

6.2. Flexible Secure Bootstrap and Maintenance 

By necessity, secure bootstrapping starts with code embedded in the coprocessor's ROM. 
This code must be simple — because such embedded code cannot be fixed, its correctness 
must be certain. This code must be public — an attacker can gain access to it by destroying 
a secure coprocessor's physical encapsulation. To allow more complex boot code to be 
used, the boot process proceeds in stages, where the primary boot code in ROM loads in a 
secondary boot loader from an external source. ; 

Dyad assumes a write-only model of installing the secondary boot cdde. The secondary 
boot code, along with any private data it needs, is stored in secure RAM after the secure 
RAM is cleared. There is no need to trust secondary boot code since no secrets are stored 
in the secure coprocessor at initialization time — ^ furthermore, users wishing to perform 
behavioral testing of the secure coprocessor hardware may load their own code at this point 
to validate the hardware. 1 - . i . 

The secondary boot code loaded by ia trusted secure coprocessor software vendor is 
loaded with a secret allowing secondary boot cbde to authenticate its identity. This secret is 
loaded at the same time as the secondary boot code, and is privacy protected: (1) the tamper 
detection circuitry will erase the secure RAM if any physical intrusion is detected; (2) the 
primary bootstrap loader will erase secure RAM pridr to Ibadirig other secondary bootstrap 
code; and (3) the secondary bootstrap code reveals not even partial information about its 
authentication secret, since it uses-; a; zero kno wledge authentication. In addition to the 
authentication secret, the secondary boot code Is provided with cryptographic checksums 

38 The ROW in the coprocessor cahnbt provide sfecrecy, since an' attacker can sacrifice a secure coprocessor to 
discover ROM contents (which are likely to be uniform across all secure coprocessors^) 
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of the coprocessor kernel and coprocessor system programs, permitting validation of the 
next higher layer of code. 

To limit the amount of secure RAM used, Dyad stores just the authentication secrets 
and a cryptographic checksum of the secondary boot code, with actual secondary bootstrap 
code being read from the host's disk or other external memory at boot time. 39 

This method of initializing the secure coprocessor permits loading of both secure co- 
processor vendor. authentication data as well as verification data for secondary boot code, 
yet prevents reinitialization from leaking sensitive data. . 

The. primary boot code is permitted only ; twq operations: installing the secondary 
boot code along with its authentication secrets; and loading, validating, and running the 
secondary boot. 40 The secondary boot code authenticates its identity — and .thus the identity 
of the secure coprocessor software vendor — r to the user. It also validates and boots the 
secure coprocessor kernel. t . . * . 

Secondary boot code in secure RAM can permit multiple versions of secure coprocessor , 
kernels, since it can store several cryptographic checksums, each corresponding to a- differ- 
ent coprocessor kernel. This permits the system administrators to back out the coprocessor 
kernel if bugs are ever discovered. Because these cryptographic checksums are kept in 
secure RAM, the coprocessor kernel may update them as newer kernels are released. 

6.3. Hardware-level Maintenance 

So, far, I have discussed only software mairitepance. , Because secure coprocessors contain 
critical data, we need to also support hardware; maintenance related functions. We, may want 
secure coprocessors to perform self-tests while otherwise idle, and generate warnings if any 
transient errors are detected (e.g., correctable memory ECC errors, encryption hardware 
self-test errors, etc), as well as permit periodic checkup maintenance testing requiring 
suspension of the coprocessors' normal operations. 

Such maintenance access- to the internals , of a secure coprocessor, while only logical 
and not physical, requires complete access to the secure coprocessor's state. Self-tests 
necessarily may require destructive writes to secure RAM; even though such self-tests are 
vendor supplied, we would like to prevent self-test code from accessing private or integrity- 
protected user data. This poses a dilemma: . the secure coprocessor state seemingly cannot 
be backed up, since this permits replay attacks for applications such as electronic currency. 41 
Secrets stored in secure RAM must remain private. 

We can securely back up secure coprocessor state, for maintenance testing and also* 
transfer the state of one secure coprocessor to a replacement secure coprocessor. The trick 



39 Alternatively,'we can use tafnper-proteeted EEPROM to 3toTe the Secondary: boot' loader to . optimize for- 
speed. See section 4.2.3 fora:disci^sio]l;ofits-seciirity ; prppe^esj * ^ ;\? ;t .,^" 
40 If we store the secondary boot loader in protected EEPROM, we can omit the loading/validation steps. 
41 The attackers back up the state of their secure coprocessor, spend some^electronic currency, and restore the 
previous state. See section 3.4. . J '* ^ ~ ' ■ • ' ■ ■ ■> • 



73 



is to use atomic transactions: state information is transactional^ transferred from the source 
secure coprocessor to a target secure coprocessor. Most of the secure RAM of source the 
secure coprocessor is erased as a result of the transactional transfer. The only secure RAM 
contents not erased are the unique authentication and public key. This is required if the 
secure coprocessor is to be reused, since new code could not be loaded otherwise. 

Dyad uses a simplified version of the traditional two-phase commit protocol [33, 53], 
since only two parties are involved and the write locks can be implicit. 42 The secure 
coprocessor transfer commit protocol requires an acknowledgement message from the target 
coprocessor after the source ' secure coprocessor (the transaction coordinator) sends the 
"commit" (or "abort") message, since the source secure coprocessor log (held in the secure 
RAM) will be forcibly truncated as a result of the transfer. 

Note that the target secure coprocessor does not have to actually store all the source state 
information in its secure RAM: if all secure coprocessors have the same capacity, it will 
not have enough secure RAM. Fortunately, the state information only needs to be logically 
transferred to the target coprocessor — - the target secure coprocessor can simply encrypt 
the state data, write it to disk, arid save just the key in its' secure' RAM. As a optimization, 
the encryption and storage of the state data can be performed entirely by the source secure 
coprocessor; only the key needs to be transactionally transferred to the back up secure 
coprocessor. 

After the state transfer is completed and secure RAM erased, testing may proceed. 
The secondary bootstrap code may now load in whatever vendor-supplied self-test code is 
needed, since this self-test code will not have any access to secret or integrity-protected user 
data. When the testing is done, we can restart the secure coprocessor (or a new one) and 
transactionally , reload the original secure RAM state. Because state is always transferred 
and never copied, such back ups are not subject to replay attacks, and the testing provides 
users with assurance against hardware faults. : 

6.4, Tolerating Hardware Faults 

At first glance, it would appear that by keeping secrets only in secure coprocessors, we face 
the risk of losing those secrets when secure coprocessor has a hardware failure. Fortunately, 
by applying a modified quorum consensus technique [37, 38], we can make a secure copro- 
cessor system fault tolerant. We assume a failstop model [82]. 

An example of such a configuration would use three secure coprocessors in a group, 
all of which maintain the same secure data. Every update transaction involves two of the 
three with a secure timestamp [90, 89], sa the secure data should remain identical between 



42 In the first phase, the itrarisacti on coordinator- asks" whether all entities . in volved in* the transaction agrees 
that, they are ,able : to commit and; *haye; logged; the/appropriate, data : to stable- storage. After a party has 
agreed that it is willing to commit, all values involved . in the.transaction are inaccessible until the coordinator 
declares a u commit"^ or "abort." The coordinator broadcasts "commit" or "abort" during the second phase, 
and transactional modifications to values become permanent or vanish. 
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transactions. Communication among the three coprocessors are encrypted. When a secure 
coprocessor fails, a new one is added (replacing the broken one) by being initialized from 
the most up-to-date of the remaining two secure coprocessors, simultaneously updating 
the group's group membership list. This update is performed transactionally, using a state 
transfer mechanism like the method described in section 6.3. If two or more coprocessors 
simultaneously fail, however, the data is unrecoverable. (Otherwise an attacker could 
separate a working trio of secure coprocessors into. three groups of isolated coprocessors 
and use that to. duplicate currency.) After regenerating to a triad of secure coprocessors, 
the failed coprocessor will be shunned by the regenerated group if it becomes operational, 
again: attackers cannot create a new quorum by faking coprocessor failures. 

In general, the number of failures F that can be tolerated can be made, arbitrarily large 
by using more secure coprocessors in a group. Let there be TV secure coprocessors in a 
group. Writes to secure data are considered successful if f#* . secure coprocessors updates 
their copy of the secure data, and reads from secure data are considered to have obtained , 
valid data only if R secure coprocessors in the group respond with (time stamped) data. 
Dyad allows failure-recovery restores to new. secure coprocessors to proceed qnly if there 
are at least p working secure coprocessors, where F ? R, N, W, and p satisfy the equations 

.... R+W > N + F ■ (6.1) 
N 

... P > > (6 " 2) 

P > R . ■ ■ ■ . • . > .(6-3) 

Equation 6.1 is the standard requirement for the number of readers and writers to overlap * 
(pigeon hole principle) frorri quorum consensus. Equation 6.2 requires that at least half of 
the coprocessors are available for regenerating the missing (and presumed dead) members 
of a group — preventing smaller partitibns from being used to clone money or other access 
capabilities. Equation 6.3 ensure that the subset of our secure coprocessor group from which 
regenerate missing ones will contain at least one coprocessor containing the correct data, 
which can be propagated to the other coprocessors as part of the recovery/regeneration 
process, preserving the reader/writer overlap invariance for the regenerated coprocessor 
group. As part of the regeneration transaction, group rnembership is updated to contain 
only the secure coprocessors in the regeneration rriansactipn. 

This technique is a simple modification of quorum consensus for fault tolerance and 
security under the secure coprocessor framework. By combining secret sharing [76, 86] 
with quorum consensus [39], replication space requirements can be reduced. 

Another approach would be to adapt striping to secure coprocessors, distributing data 
and error correction bits among several secure coprocessors. (Also see information on 
RAID [64].) This requires that every logical write to the secure data result in an atomic 
set of writes to secure coprocessors within the group, with data transmission among secure . 
coprocessors encrypted: Recovery; of data, due to a. failed-secure; coprocessor would op- 
erate in the same fashion as in classic striped systemsi°with;tne replacement coprocessor 
initialized via a transactional state transfer so it will jpbssess' the ericr^tibn' keys necessary 
to'commuriicate with its peers. . ' , . • [ .r 
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Using multiple secure coprocessors dramatically reduces the likelihood of critical data 
being lost due to hardware failures. This enables the use of secure coprocessor technology 
for large scale and high reliability applications. 1 also eliminate the possibility that a single 
hardware failure would preclude properly licensed programs from running. 
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Chapter 7 

Verification and Potential Failures 



Security critical systems are not just vulnerable to hardware-level attacks and simple hard- 
ware faults; the delivered hardware might have been substituted with bogus, trojan-horse 
hardware, and the system software may contain bugs. This chapter explains how users can 
verify secure coprocessor hardware, and shows how the secure coprocessor system design 
helps isolate the effects of software faults and check software. Additionally, this chapter 
analyzes the consequences of potential failures in the system and identifies the degree of 
trust that must be placed on hardware and system software vendors 



7.1 . Hardware Verification 

The self-tests that I considered in section 6.3 are vendor-provided executables. Suppose 
we wish to verify that the secure coprocessor or system software vendor is not supplying 
us with bogus secure coprocessor hardware. Can some form of testing be performed? 

By modifying the self-test procedure, we can perform limited statistical checking of 
secure coprocessor hardware. To verify that the hardware originated from the proper 
hardware vendor, the local system administrators or security officers may reset a fraction 
of the secure coprocessors and load in hardware verification software in lieu of a secondary 
bootstrap loader. This permits arbitrary secure coprocessor hardware testing code to be 
loaded. While sophisticated bogus hardware could be made to operate identically to a real 
secure coprocessor under most conditions, this software probing can, coupled with gross 
hardware verification (e.g., verifying the X-ray image of the circuit board and other physical 
controls), provide us with strong assurances that the secure secure coprocessor hardware is 
genuine. 

Note that this testing is quasi-destructive, since the authentication secrets stored by the 
coprocessor system software vendor are lost. These coprocessors may, however, be returned 
to the system software vendor to be reinitialized with a new set of authentication secrets. 
Additional destructive testing of secure coprocessors may be performed on a spot-check 
basis for greater assurances of the authenticity of the secure coprocessor hardware. 
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7.2. System Software Verification 



Having the secure coprocessor security kernel provide logical security (basic peer-to-peer 
authentication, encrypted communication channels, and private address spaces) is central 
to being able to run secure applications within a secure coprocessor. While any absolute 
proof of correctness of security kernels Js outside of the scope of this thesis and such proofs 
will not be feasible for a long time (if ever), we must have some assurance of the security 
of secure coprocessor system software. 

In the Dyad system, the Mach 3.0 kernel runs in the secure cgprocessor. It is a small 
security kernel with capability-based interprocess communication- /configured with only a 
few device drivers necessary for communicating with the host system. Because the kernel 
code is cryptographically fingerprinted by the system software vendor and not encrypted, 
the code may be independently inspected. Though failstop bugs in the coprocessor kernel 
would not permit disclosure of secrets, it remains to be shown whether the system design 
can minimize the amount of damage caused by other kinds of kernel bugs. 

The system design isolates security-critical portions of the coprocessor security kernel, 
reduces the impact of bugs, and makes analysis easier. . 

I, assume that the kernel provides private address spaces : using the underlying virtual 
memory hardware, a very stable technology. I also assume that the secure applications do 
not intentionally, reveal their own secrets, whether explicitly or through covert channels. 
Furthermore;, I assume that bugs in one part of the kernel do not have far-reaching effects, 
e.g., permit user-level code to arbitrarily modify, another part of, the kernel. 

Dyad uses security checkpoints to minimize the impact of bugs in the rest of the sys- 
tetn. These are the security critical portions of the kernel that must bear close inspection. 
Fortunately, there are only a few modules controlling I/O between the host system and the 
secure coprocessor (the port and DMA drivers) and access to the secure RAM (the iopl 
interface for accessing the secure RAM and the sec. ram server — see section 4.2.3). 
These security-critical modules are well isolated, and provide an opportunity for carefully 
controlling and checking data floWj simplifying the code inspection task. 

The example of crypto-pagirig illustrates hbW testing \\s simplified. Instead of looking at 
the code for the default pager, we simply make sure that encryption is turned on whenever 
we use the DMA interface on the default pager's behalf. Similarly, for access control 
to the secure RAM, the iopl interface allows only a privileged client (the sec_mem 
server) to map in the secure RAM into the client's address space, and the sec _ mem Server 
provides 'access control among the secure applications. The secure RAM's physical address 
range is not otherwise known to the kernel, and the virtual membry subsystem could riot 
accidentally provide access tb if unless the memory mapjiing entries are copied from the 
sec _ mem server's address map. If we do nbt Wdrit to trust trie' Virtual memory subsystem to 
prevent unauthorized access, we could provide a trivial device driver performing physical 
I/O only 5 to the address range 1 of -the ^cuie J RAM with r exclusive use by the see_mem 
server. The sec.mem server cbde s ;°of dour sb y must also be c^enilly scrutinized to only 
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give access to appropriate portions of the secure RAM as part. of the cryptographic loading 
of a secure application's code. 

Because Dyad has simple secure memory interfaces and host interface, it is possible 
to focus on the security properties of the code implementing these interfaces. Rigorously 
checking this code decreases the likelihood that bugs in the Mach kernel could cause secret 
data disclosure. While this does not replace rigorous correctness proofs of the kernel code, 
we can increase our confidence that kernel'bugs will not cause catastrophic disclosure of 
secrets. ■ ' " ■ ' ' 

7.3. Failure Modes 

An ideal distributed security system would never. fail, but any serious design must take 
failures into account. In this section, ^discuss the, potential failure modes of the Dyad, 
system and examine the risks involved, 

I identify the potential sources of , security, breaches and consider their impacts. There 
are several secrets crucial to the correct operation of the overall system and their disclosure 
would have a severe impact oh the system. Some of these reside only within "software 
manufacturers' facilities, and others are also kept in secure coprocessors in the field. 

The most critical secret in the system is the secure-coprocessor software private key. 
This key is created at the system software manufacturing facilities and produces a digitally 
signed certificate for every new coprocessor, each certifying the public key and 1 authenti-" 
cation puzzle as belonging to a' secure coprocessor identity created by that manufacturer. 
The corresponding private key arid secret authentication puzzle solution are loaded into the 
secure memory as part of the system software installation, along with the certificate. 

Disclosure of the system software manufacturer's signature key permits attackers to 
create fake secure coprocessors, and- these wwsecure coprocessors or software emulations 
can totally compromise the system. 

In a similar fashion, if attackers-possess the secret key and authentication puzzle solution 
of a secure coprocessor, they can obtain any application-specific secrets associated with 
secure applications subsequently, installed on that coprocessor. 4 ^ Furthermore, attackers 
will also be able to expose secrets- stored in other secure coprocessors they manage, since 
they can use an unsecure coprocessdr as a transactional state transfer target. 

Coprocessor-specific secrets are only vulnerable to exposure between the time of gener- 
ation and the time of installation; by my main axiom, it is impossible to to obtain secrets after 
they are f installed in a secure coprocessor. Additiqnal security can optionally be obtained 
by requiring authorization (perhaps from the system software vendor) before engaging in 
transactional state transfers. , . ; r .... \ ' 

Qne.particularly security ^en^tiye application is .electronic .currency, and it is important 
to discuss how disclosures qX critical secrets will -compromise the-system. The critical 

— ^ — — ■- . . l ' ' '•>■•: Uh-uj : a- .>i.-yyy,\ [.,».;;<*.!, v. 

4 ^If the attacker had logged the pre^ipus installation of sequre ; appliicadpnsyth0se x applic^tionrspecjfic secrets - t 
(ancl the privacy of the texts ; Qf progranps,themseJves) are also ^ v f .. , r 
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data is the electronic currency application authentication puzzle solution. Disclosure of 
this information permits creation of electronic cash, if access to the secure channel be- 
tween secure coprocessors can be achieved. Since having access to the individual secure 
coprocessor secrets implies access to the application secrets, one method of increasing 
the work required to attack the system is to have the electronic currency application use 
secure channels provided by the secure coprocessor kernel (perhaps with doubly encrypting 
using application-level keys as well). The kernel performs coprocessor-to-coprocessor key 
exchange using the individual secure coprocessor secrets. This forces attackers to obtain 
access to individual secure coprocessor secrets rather than just the application secrets. 

Furtherapplication-specific limits can limit the amount of the damage. In the case 
of electronic currency, the electronic currency application can limit the. total amount of 
electronic currency that may be stored within a secure coprocessor. This limitation reduces 
the risk of losing money as a result of catastrophic hardware failure, and also reduces the 
rate at which fake electronic currency may be introduced into the system if secrets are 
compromised. Additional limits may be added^to fe$trict the rate at which electronic funds 
can be transferred, though this only serves as a tourniquet and cannot solve the problems 
of compromised secret keys. 

Similar problems occur if the underlying cryptographic system is broken. The in- 
tractability of factoring large moduli is basic to both the authentication and public key 
systems. If a modulus used in a cryptographic algorithm is factored, secrets would be 
similarly revealed. This problem is endemic to cryptographic applications in general. 

7.4. Previous Work 

Previous work on system isolation include fences [69] which introduced the idea of using 
cryptographic checks to find system errors. Trusted computing bases form an important 
part of the "Orange Book" trusted. Computer System Evaluation Criteria [101]. Trusted 
computing bases rely on a strict security boundary between the secure environment and 
the unsecure environment — all the r computet hardware and software, including the ter- 
minals, are considered secure, and the users are not. The system software implements the 
appropriate access control, often mandatory, to enforce policy. 




80 



Chapter 8 
Performance 



This chapter discusses Dyad's performance:- First, I examine the implementations of my 
authentication and fingerprinting algorithms. Next 5 l look at the overhead of crypto-paging 
relative to simple paging. 

8.1. Cryptographic Algorithms 

This section gives timing figures for my implementation of the authentication algorithm 
and fingerprinting algorithm described in chapter 5. Because the Citadel unit is a research 
prototype and its processor, will be updated to newer, faster RISC processor^ my timing 
figures are for several processors:, an i386SX processor running at 16 MHz; an i486DX2/66 
processor; a MIPS R3000 processor; a Power processor for an IBM RS6000/950; and pro-, 
jected figures for a 60 1 PowerPC. Table 8.1 shows running times for the basic authentication 
algorithm and the processing rates for the fingerprint algorithm on these processors. 

The Citadel processor requires 3.45 seconds to perform zero knowledge two-way au- 
thentication (see section 5.1.4) to achieve a security factor of 3.18 x 10 28 ? using a 150 
decimal digit modulus. 44 To perform r theautheriti cation, Citadel and the host processor 
(which provides networking for the secure coprocessor) must exchange 4 messages. The 
overhead for sending a message between Citadel and the host processor is approximately 
0.96 S; much of this overhead should disappear if the device drivers in the host and in 
the Citadel-side Dyad kernel did hot need to poll hardware status. (See section 4.2.1 
for a discussion of the source of this overhead.) We anticipate dramatic improvement in 
authentication time in the next generation of the Dyad hardware base. 

The fastest fingerprinting implementation (see section 5.1.5) running on the Citadel's 
i386SX fingerprints at 410 Kbytes/sec. This assembler-coded routine uses a 65536-entry 
table (2 16 ) of precomputed partial residues. This code run at the maximum possible memory 
bandwidth: for comparison, a tight assembler loop loading source memory into a register 
reads memory at a rate of less than 1.1 Mbytes/sec on the Citadel, and the fingerprint table 
look-up code reads two additional memory words per word of input data. Because the 
i386SX has no on-chip cache and the Citadel board provides no external cache memory, 



^Factoring such a modulus should require approximately 20000 MlPS-years of CPU time using contemporary 
(May 1994) factoring techniques [50, 87]. 
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Algorithm 


i386SX 
(16 MHz) 


MIPS R3000 
(20 MHz) 


i486 DX2/66 


RS6000/950 


PowerPC 601 
(66 MHz) 


Authentication 
(mS) 


3450 


249 


167 


114 


. 86.0 est. 


Fingerprint 
(MB/S) 


0.410 


1.14 


1.42 


3.99 


2.70 est. 



Table 8.1 . Cryptographic Algorithms Run Time 

Because the Citadel prototype coprocessor is a research prototype, its processor, a i386SX 
running at 16 MHz, is likely to be upgraded to a .newer, faster processor when secure 
coprocessors become commercial products. To obtain these run times for non-Citadel 
processors, I ran the portable C-language implementations of these algorithms on test data 
on commercially available PCs and workstations (a DECstation 5000/200, an Elite 486 PC, 
and an IBM RS6000/950); the times for the PowerPC 601 is extrapolated from its SPECint 
ratings. 



some of the memory bandwidth is expended fetching instructions. A more space-efficient 
assembler language implementation uses a much smaller 256-entry table and fingerprints 
at 226 Kbytes/sec, or about 55% of the speed of the first implementation. On Citadel, the 
most tightly tuned C language implementations of the fingerprint algorithm achieve only 
224 Kbytes/sec and 204 Kbytes/sec for large and small tables respectively, largely because 
of the inability of the compiler to avoid register spills into memory and to optimally use 
(and in some cases, even generate) some i386 instructions. 

(The residue table initialization algorithm is described in section 5.2.5. For the large 
table, the time required is approximately 1.23 S; the time for the small table is negligible 
(42 mS). Note that this is a one-time charge.) 

My experiments recommend that the smaller assembler coded version be used for most 
cases. The large table version is useful where the same irreducible polynomial is used 
for a large amount of data (perhaps when checking disk contents); the small version wins 
when the irreducible polynomial is changed often, or where there are tight real-memory 
requirements (such as in the Citadel prototype). When cache memory is added to future 
generations of Citadel, the smaller-table version will gain in performance relative to the 
larger-table version because the table of partial residues should easily fit within the cache. 

Smaller code size is desirable for security code. When the code is smaller, the system 
is easier to verify and less likely to contain bugs. The key exchange routines consists of 
80 lines of C code. The authentication routines consists of 75 lines of C code. Both the 
key exchange and the authentication code are written on top of a library of routines for 
calculating with arbitrarily large integers. The fingerprinting code consists of 211 lines of 
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C code and J 60 lines of i386 assembler. My total core routines are relatively small: 366 
lines of C code and 1 60 lines of assembler. 

8.2. Crypto-Paging 

The overhead for crypto-paging is immeasurable, since both crypto-paging and normal 
paging activity go through the hardware DES machinery and DMA channels. Overhead 
only incurs when the encryption keys are set. This happens every time a page is written out 
to host memory, where a (small) encrypted system disk image resides. 

Additionally, the host system imposes limits on the number of pages that cafi be trans- 
ferred, since we cannot guarantee that the disk image will reside in physically contiguous 
memory. This means that if paging was not encrypted, the number of bytes copied per 
DMA transfer is most likely to be a single virtual memory page (4K) any way, - and the 
currently high pier-DMA -transfer overhead (0.96 S) cannot be amortized over many pages 
of memory. ; > . .. . ■ ■ 
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Chapter 9 

Conclusion and Future Work 

The problem of providing security properties for widely distributed systems is a difficult 
one that must be solved before applications with strong privacy and integrity demands, such 
as electronic commerce applications, can be safely deployed. All cryptographic protocols 
require secrets to be kept, and all access control software assume the ability to maintain the 
integrity of the access control database. These assumptions must be satisfied in any serious 
secure distributed system; providing these security properties in a rigorous and complete 
way is impossible without some form of physically secure hardware. 

In this thesis 1 have shown that it is possible to provide very strong security guarantees 
without putting the entire computer in a locked room. By adding secure coprocessors to 
normal workstations or PCs, overall security may be bootstrapped from a core set of security 
properties guaranteed by secure coprocessor hardware. Cryptographic techniques to check 
integrity and to protect privacy can provide much stronger system-level security guarantees 
can be provided than were previously possible. 

Furthermore, by applying transaction processing techniques to security, I built electronic 
currency systems where money cannot be created or destroyed accidentally. By using 
quorum consensus and transactions, I designed fault tolerant secure coprocessor systems. 

I have analyzed the native security properties of various components of the soft- 
ware/hardware system, and arranged them into a security hierarchy; furthermore, I used 
cryptographic techniques to enhance security properties. This separation of the system 
architectural components by their security properties permit secure-system designers to 
reason realistically about what kinds of security properties are actually achievable. 

The contributions of this thesis may be summarized as follows: 

• end-to-end analysis of the security properties of the system components, both at the 
hardware level and at the software level; 

• design and analysis of combined hardware-software architecture for bootstrapping 
security guarantees throughout the system, using cryptographic techniques at the 
system component boundaries (including crypto-paging and crypto-sealing); 

• demonstration of the feasibility of the architecture by constructing a working pro- 
totype system, providing insights into system design issues that restrict the overall 
system architecture; 
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• design, analysis, implemention. and measurement of cryptographic protocols for 
zero-knowledge authentication and key exchange, suitable for use in security critical 
environments; 

• demonstrating that secure coprocessors may be statistically checked against vendor 
fraud; 

• showing how securie coprocessors may be operated in a fault-tolerant manner; and 

• designing solutions to exemplar electronic commerce applications, including building 
an electronic currency application and analyzing how cryptographic stamps may be 
used. ( . 

Secure coprocessors exist today and can solve many pressing distributed security prob- 
lems, but there remains several challenges io be solved by future developers of secure 
coprocessor technology. The need for a general, low-cost distributed transaction system 
is apparent, and it remains to be shown that one can be built to run efficiently within the 
secure coprocessor environment. Tools for automating the task of splitting applications 
are need, and the issue of providing operating system suppbrt for split secure-coprocessor 
applications remains to be fully explored. Most importantly,, many secure applications 
building ori secure coprocessors remain to be discovered. 
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